Coder Social home page Coder Social logo

Sweep: about php-dna HOT 1 CLOSED

liberu-genealogy avatar liberu-genealogy commented on May 27, 2024 2
Sweep:

from php-dna.

Comments (1)

sweep-ai avatar sweep-ai commented on May 27, 2024

🚀 Here's the PR! #116

See Sweep's progress at the progress dashboard!
💎 Sweep Pro: I'm using GPT-4. You have unlimited GPT-4 tickets. (tracking ID: 4fe27ff212)

Tip

I'll email you at [email protected] when I complete this pull request!


Actions (click)

  • ↻ Restart Sweep

GitHub Actions failed

The sandbox appears to be unavailable or down.


Step 1: 🔎 Searching

I found the following snippets in your repository. I will now analyze these snippets and come up with a plan.

Some code snippets I think are relevant in decreasing order of relevance (click to expand). If some file is missing from here, you can mention the path in the ticket description.

<?php
namespace Dna\Snps;
use Countable;
use Dna\Resources;
use Dna\Snps\IO\IO;
use Dna\Snps\IO\Reader;
use Dna\Snps\IO\Writer;
use Iterator;
// You may need to find alternative libraries for numpy, pandas, and snps in PHP, as these libraries are specific to Python
// For numpy, consider using a library such as MathPHP: https://github.com/markrogoyski/math-php
// For pandas, you can use DataFrame from https://github.com/aberenyi/php-dataframe, though it is not as feature-rich as pandas
// For snps, you'll need to find a suitable PHP alternative or adapt the Python code to PHP
// import copy // In PHP, you don't need to import the 'copy' module, as objects are automatically copied when assigned to variables
// from itertools import groupby, count // PHP has built-in support for array functions that can handle these operations natively
// import logging // For logging in PHP, you can use Monolog: https://github.com/Seldaek/monolog
// use Monolog\Logger;
// use Monolog\Handler\StreamHandler;
// import os, re, warnings
// PHP has built-in support for file operations, regex, and error handling, so no need to import these modules
// import numpy as np // See the note above about using MathPHP or another PHP library for numerical operations
// import pandas as pd // See the note above about using php-dataframe or another PHP library for data manipulation
// from pandas.api.types import CategoricalDtype // If using php-dataframe, check documentation for similar functionality
// For snps.ensembl, snps.resources, snps.io, and snps.utils, you'll need to find suitable PHP alternatives or adapt the Python code
// from snps.ensembl import EnsemblRestClient
// from snps.resources import Resources
// from snps.io import Reader, Writer, get_empty_snps_dataframe
// from snps.utils import Parallelizer
// Set up logging
// $logger = new Logger('my_logger');
// $logger->pushHandler(new StreamHandler('php://stderr', Logger::DEBUG));
class SNPs implements Countable, Iterator
{
private array $_source = [];
private array $_snps = [];
private int $_build = 0;
private ?bool $_phased = null;
private ?bool $_build_detected = null;
private ?Resources $_resources = null;
private ?string $_chip = null;
private ?string $_chip_version = null;
private ?string $_cluster = null;
private int $_position = 0;
private array $_keys = [];
private array $_duplicate;
private array $_discrepant_XY;
private array $_heterozygous_MT;
private $_chip;
private $_chip_version;
private $_cluster;
/**
* SNPs constructor.
*
* @param string $file Input file path
* @param bool $only_detect_source Flag to indicate whether to only detect the source
* @param bool $assign_par_snps Flag to indicate whether to assign par_snps
* @param string $output_dir Output directory path
* @param string $resources_dir Resources directory path
* @param bool $deduplicate Flag to indicate whether to deduplicate
* @param bool $deduplicate_XY_chrom Flag to indicate whether to deduplicate XY chromosome
* @param bool $deduplicate_MT_chrom Flag to indicate whether to deduplicate MT chromosome
* @param bool $parallelize Flag to indicate whether to parallelize
* @param int $processes Number of processes to use for parallelization
* @param array $rsids Array of rsids
*/
public function __construct(
private $file = "",
private bool $only_detect_source = False,
private bool $assign_par_snps = False,
private string $output_dir = "output",
private string $resources_dir = "resources",
private bool $deduplicate = True,
private bool $deduplicate_XY_chrom = True,
private bool $deduplicate_MT_chrom = True,
private bool $parallelize = False,
private int $processes = 1, // cpu count
private array $rsids = [],
private $ensemblRestClient = null,
) //, $only_detect_source, $output_dir, $resources_dir, $parallelize, $processes)
{
// $this->_only_detect_source = $only_detect_source;
$this->setSNPs(IO::get_empty_snps_dataframe());
$this->_duplicate = IO::get_empty_snps_dataframe();
$this->_discrepant_XY = IO::get_empty_snps_dataframe();
$this->_heterozygous_MT = IO::get_empty_snps_dataframe();
// $this->_discrepant_vcf_position = $this->get_empty_snps_dataframe();
// $this->_low_quality = $this->_snps->index;
// $this->_discrepant_merge_positions = new DataFrame();
// $this->_discrepant_merge_genotypes = new DataFrame();
$this->_source = [];
// $this->_phased = false;
$this->_build = 0;
$this->_build_detected = false;
// $this->_output_dir = $output_dir;
$this->_resources = new Resources($resources_dir);
// $this->_parallelizer = new Parallelizer($parallelize, $processes);
$this->_cluster = "";
$this->_chip = "";
$this->_chip_version = "";
$this->ensemblRestClient = $ensemblRestClient ?? new EnsemblRestClient("https://api.ncbi.nlm.nih.gov", 1);
if (!empty($file)) {
$this->readFile();
}
}
public function count(): int
{
return $this->get_count();
}
public function current(): SNPs
{
return $this->_snps[$this->_position];
}
public function key(): int
{
return $this->_position;
}
public function next(): void
{
++$this->_position;
}
public function rewind(): void
{
$this->_position = 0;
}
public function valid(): bool
{
return isset($this->_snps[$this->_position]);
}
/**
* Get the SNPs as a DataFrame.
*
* @return SNPs[] The SNPs array
*/
public function filter(callable $callback)
{
return array_filter($this->_snps, $callback);
}
/**
* Get the value of the source property.
*
* @return string
* Data source(s) for this `SNPs` object, separated by ", ".
*/
public function getSource(): string
{
return implode(", ", $this->_source);
}
public function getAllSources(): array
{
return $this->_source;
}
/**
* Magic method to handle property access.
*
* @param string $name
* The name of the property.
*
* @return mixed
* The value of the property.
*/
public function __get(string $name)
{
$getter = 'get' . ucfirst($name);
if (method_exists($this, $getter)) {
return $this->$getter();
}
return null; // Or throw an exception for undefined properties
}
public function setSNPs(array $snps)
{
$this->_snps = $snps;
$this->_keys = array_keys($snps);
}
protected function readFile()
{
// print_r($this->file);
$d = $this->readRawData($this->file, $this->only_detect_source, $this->rsids);
$this->setSNPs($d["snps"]);
$this->_source = (strpos($d["source"], ", ") !== false) ? explode(", ", $d["source"]) : [$d["source"]];
$this->_phased = $d["phased"];
$this->_build = $d["build"] ?? null;
$this->_build_detected = !empty($d["build"]);
// echo "HERE\n";
// var_dump($d["build"]);
// var_dump($this->_build_detected);
// $this->_cluster = $d["cluster"];
// if not self._snps.empty:
// self.sort()
// if deduplicate:
// self._deduplicate_rsids()
// # use build detected from `read` method or comments, if any
// # otherwise use SNP positions to detect build
// if not self._build_detected:
// self._build = self.detect_build()
// self._build_detected = True if self._build else False
// if not self._build:
// self._build = 37 # assume Build 37 / GRCh37 if not detected
// else:
// self._build_detected = True
if (!empty($this->_snps)) {
$this->sort();
if ($this->deduplicate)
$this->_deduplicate_rsids();
// use build detected from `read` method or comments, if any
// otherwise use SNP positions to detect build
if (!$this->_build_detected) {
$this->_build = $this->detect_build();
$this->_build_detected = $this->_build ? true : false;
if (!$this->_build) {
$this->_build = 37; // assume Build 37 / GRCh37 if not detected
} else {
$this->_build_detected = true;
}
}
// if ($this->assign_par_snps) {
// $this->assignParSnps();
// $this->sort();
// }
// if ($this->deduplicate_XY_chrom) {
// if (
// ($this->deduplicate_XY_chrom === true && $this->determine_sex() == "Male")
// || ($this->determine_sex(chrom: $this->deduplicate_XY_chrom) == "Male")
// ) {
// $this->deduplicate_XY_chrom();
// }
// }
// if ($this->deduplicate_MT_chrom) {
// echo "deduping yo...\n";
// $this->deduplicate_MT_chrom();
// }
}
}
protected function readRawData($file, $only_detect_source, $rsids = [])
{
$r = new Reader($file, $only_detect_source, $this->_resources, $rsids);
return $r->read();
}
/**
* Get the SNPs as an array.
*
* @return array The SNPs array
*/
public function getSnps(): array
{
return $this->_snps;
}
/**
* Status indicating if build of SNPs was detected.
*
* @return bool True if the build was detected, False otherwise
*/
public function isBuildDetected(): bool
{
return $this->_build_detected;
}
/**
* Get the build number associated with the data.
*
* @return mixed The build number
*/
public function getBuild()
{
return $this->_build;
}
public function setBuild($build)
{
$this->_build = $build;
}
/**
* Detected deduced genotype / chip array, if any, per computeClusterOverlap.
*
* @return string Detected chip array, else an empty string.
*/
public function getChip()
{
if (empty($this->_chip)) {
$this->computeClusterOverlap();
}
return $this->_chip;
}
/**
* Detected genotype / chip array version, if any, per
* computeClusterOverlap.
*
* Chip array version is only applicable to 23andMe (v3, v4, v5) and AncestryDNA (v1, v2) files.
*
* @return string Detected chip array version, e.g., 'v4', else an empty string.
*/
public function getChipVersion()
{
if (!$this->_chip_version) {
$this->computeClusterOverlap();
}
return $this->_chip_version;
}
/**
* Compute overlap with chip clusters.
*
* Chip clusters, which are defined in [1]_, are associated with deduced genotype /
* chip arrays and DTC companies.
*
* This method also sets the values returned by the `cluster`, `chip`, and
* `chip_version` properties, based on max overlap, if the specified threshold is
* satisfied.
*
* @param float $clusterOverlapThreshold
* Threshold for cluster to overlap this SNPs object, and vice versa, to set
* values returned by the `cluster`, `chip`, and `chip_version` properties.
*
* @return array
* Associative array with the following keys:
* - `companyComposition`: DTC company composition of associated cluster from [1]_
* - `chipBaseDeduced`: Deduced genotype / chip array of associated cluster from [1]_
* - `snpsInCluster`: Count of SNPs in cluster
* - `snpsInCommon`: Count of SNPs in common with cluster (inner merge with cluster)
* - `overlapWithCluster`: Percentage overlap of `snpsInCommon` with cluster
* - `overlapWithSelf`: Percentage overlap of `snpsInCommon` with this SNPs object
*
* @see https://doi.org/10.1016/j.csbj.2021.06.040
* Chang Lu, Bastian Greshake Tzovaras, Julian Gough, A survey of
* direct-to-consumer genotype data, and quality control tool
* (GenomePrep) for research, Computational and Structural
* Biotechnology Journal, Volume 19, 2021, Pages 3747-3754, ISSN
* 2001-0370.
*/
public function computeClusterOverlap($cluster_overlap_threshold = 0.95)
{
$data = [
"cluster_id" => ["c1", "c3", "c4", "c5", "v5"],
"company_composition" => [
"23andMe-v4",
"AncestryDNA-v1, FTDNA, MyHeritage",
"23andMe-v3",
"AncestryDNA-v2",
"23andMe-v5, LivingDNA",
],
"chip_base_deduced" => [
"HTS iSelect HD",
"OmniExpress",
"OmniExpress plus",
"OmniExpress plus",
"Illumina GSAs",
],
"snps_in_cluster" => [0, 0, 0, 0, 0],
"snps_in_common" => [0, 0, 0, 0, 0],
];
$keys = array_keys($data);
$df = [];
foreach ($data['cluster_id'] as $index => $cluster_id) {
$entry = ['cluster_id' => $cluster_id];
foreach ($keys as $key) {
$entry[$key] = $data[$key][$index];
}
$df[] = $entry;
}
if ($this->build != 37) {
// Create a deep copy of the current object
$toRemap = clone $this;
// Call the remap method on the copied object
$toRemap->remap(37); // clusters are relative to Build 37
// Extract "chrom" and "pos" values from snps and remove duplicates
$selfSnps = [];
foreach ($toRemap->snps as $snp) {
if (
!in_array($snp["chrom"], array_column($selfSnps, "chrom")) ||
!in_array($snp["pos"], array_column($selfSnps, "pos"))
) {
$selfSnps[] = $snp;
}
}
} else {
// Extract "chrom" and "pos" values from snps and remove duplicates
$selfSnps = [];
foreach ($this->snps as $snp) {
if (
!in_array($snp["chrom"], array_column($selfSnps, "chrom")) ||
!in_array($snp["pos"], array_column($selfSnps, "pos"))
) {
$selfSnps[] = $snp;
}
}
}
$chip_clusters = $this->_resources->get_chip_clusters();
foreach ($df as $cluster => $row) {
$cluster_snps = array_filter($chip_clusters, function ($chip_cluster) use ($cluster) {
return strpos($chip_cluster['clusters'], $cluster) !== false;
});
$df[$cluster]["snps_in_cluster"] = count($cluster_snps);
$df[$cluster]["snps_in_common"] = count(array_uintersect($selfSnps, $cluster_snps, function ($a, $b) {
return $a["chrom"] == $b["chrom"] && $a["pos"] == $b["pos"] ? 0 : 1;
}));
}
foreach ($df as &$row) {
$row["overlap_with_cluster"] = $row["snps_in_common"] / $row["snps_in_cluster"];
$row["overlap_with_self"] = $row["snps_in_common"] / count($selfSnps);
}
$max_overlap = array_keys($df, max($df))[0];
if (
$df["overlap_with_cluster"][$max_overlap] > $cluster_overlap_threshold
&& $df["overlap_with_self"][$max_overlap] > $cluster_overlap_threshold
) {
$this->_cluster = $max_overlap;
$this->_chip = $df["chip_base_deduced"][$max_overlap];
$company_composition = $df["company_composition"][$max_overlap];
if ($this->source === "23andMe" || $this->source === "AncestryDNA") {
$i = strpos($company_composition, "v");
if ($i !== false) {
$this->_chip_version = substr($company_composition, $i, 2);
}
} else {
error_log("Detected SNPs data source not found in cluster's company composition");
}
}
return $df;
}
/**
* Discrepant XY SNPs.
*
* Discrepant XY SNPs are SNPs that are assigned to both the X and Y chromosomes.
*
* @return array Discrepant XY SNPs
*/
public function getDiscrepantXY()
{
return $this->_discrepant_XY;
}
/**
* Get the duplicate SNPs.
*
* A duplicate SNP has the same RSID as another SNP. The first occurrence
* of the RSID is not considered a duplicate SNP.
*
* @return SNPs[] Duplicate SNPs
*/
public function getDuplicate()
{
return $this->_duplicate;
}
/**
* Count of SNPs.
*
* @param string $chrom (optional) Chromosome (e.g., "1", "X", "MT")
* @return int The count of SNPs for the given chromosome
*/
public function get_count($chrom = "")
{
return count($this->_filter($chrom));
}
protected function _filter($chrom = "")
{
if (!empty($chrom)) {
$filteredSnps = array_filter($this->_snps, function ($snp) use ($chrom) {
return $snp['chrom'] === $chrom;
});
return $filteredSnps;
} else {
return $this->_snps;
}
}
/**
* Detect build of SNPs.
*
* Use the coordinates of common SNPs to identify the build / assembly of a genotype file
* that is being loaded.
*
* Notes:
* - rs3094315 : plus strand in 36, 37, and 38
* - rs11928389 : plus strand in 36, minus strand in 37 and 38
* - rs2500347 : plus strand in 36 and 37, minus strand in 38
* - rs964481 : plus strand in 36, 37, and 38
* - rs2341354 : plus strand in 36, 37, and 38
* - rs3850290 : plus strand in 36, 37, and 38
* - rs1329546 : plus strand in 36, 37, and 38
*
* Returns detected build of SNPs, else 0
*
* References:
* 1. Yates et. al. (doi:10.1093/bioinformatics/btu613),
* <http://europepmc.org/search/?query=DOI:10.1093/bioinformatics/btu613>
* 2. Zerbino et. al. (doi.org/10.1093/nar/gkx1098), https://doi.org/10.1093/nar/gkx1098
* 3. Sherry ST, Ward MH, Kholodov M, Baker J, Phan L, Smigielski EM, Sirotkin K.
* dbSNP: the NCBI database of genetic variation. Nucleic Acids Res. 2001
* Jan 1;29(1):308-11.
* 4. Database of Single Nucleotide Polymorphisms (dbSNP). Bethesda (MD): National Center
* for Biotechnology Information, National Library of Medicine. dbSNP accession: rs3094315,
* rs11928389, rs2500347, rs964481, rs2341354, rs3850290, and rs1329546
* (dbSNP Build ID: 151). Available from: http://www.ncbi.nlm.nih.gov/SNP/
*/
protected function detect_build(): int
{
// print_r($this->_snps);
$lookup_build_with_snp_pos = function ($pos, $s) {
foreach ($s as $index => $value) {
if ($value == $pos) {
return $index;
}
}
return 0;
};
$build = 0;
$rsids = [
"rs3094315",
"rs11928389",
"rs2500347",
"rs964481",
"rs2341354",
"rs3850290",
"rs1329546",
];
$df = [
"rs3094315" => [36 => 742429, 37 => 752566, 38 => 817186],
"rs11928389" => [36 => 50908372, 37 => 50927009, 38 => 50889578],
"rs2500347" => [36 => 143649677, 37 => 144938320, 38 => 148946169],
"rs964481" => [36 => 27566744, 37 => 27656823, 38 => 27638706],
"rs2341354" => [36 => 908436, 37 => 918573, 38 => 983193],
"rs3850290" => [36 => 22315141, 37 => 23245301, 38 => 22776092],
"rs1329546" => [36 => 135302086, 37 => 135474420, 38 => 136392261]
];
foreach ($this->_snps as $snp) {
if (in_array($snp['rsid'], $rsids)) {
$build = $lookup_build_with_snp_pos($snp['pos'], $df[$snp['rsid']]);
}
if ($build) {
break;
}
}
return $build;
}
/**
* Convert the SNPs object to a string representation.
*
* @return string The string representation of the SNPs object
*/
public function __toString()
{
if (is_string($this->file) && is_file($this->file)) {
// If the file path is a string, return SNPs with the basename of the file
return "SNPs('" . basename($this->file) . "')";
} else {
// If the file path is not a string, return SNPs with <bytes>
return "SNPs(<bytes>)";
}
}
/**
* Get the assembly of the SNPs.
*
* @return string The assembly of the SNPs
*/
public function getAssembly(): string
{
if ($this->_build === 37) {
return "GRCh37";
} elseif ($this->_build === 36) {
return "NCBI36";
} elseif ($this->_build === 38) {
return "GRCh38";
} else {
return "";
}
}
/**
* Assign PAR SNPs to the X or Y chromosome using SNP position.
*
* References:
* 1. National Center for Biotechnology Information, Variation Services, RefSNP,
* https://api.ncbi.nlm.nih.gov/variation/v0/
* 2. Yates et. al. (doi:10.1093/bioinformatics/btu613),
* http://europepmc.org/search/?query=DOI:10.1093/bioinformatics/btu613
* 3. Zerbino et. al. (doi.org/10.1093/nar/gkx1098), https://doi.org/10.1093/nar/gkx1098
* 4. Sherry ST, Ward MH, Kholodov M, Baker J, Phan L, Smigielski EM, Sirotkin K.
* dbSNP: the NCBI database of genetic variation. Nucleic Acids Res. 2001 Jan 1;
* 29(1):308-11.
* 5. Database of Single Nucleotide Polymorphisms (dbSNP). Bethesda (MD): National Center
* for Biotechnology Information, National Library of Medicine. dbSNP accession:
* rs28736870, rs113313554, and rs758419898 (dbSNP Build ID: 151). Available from:
* http://www.ncbi.nlm.nih.gov/SNP/
*/
protected function assignParSnps()
{
$restClient = $this->ensemblRestClient;
$snps = $this->filter(function ($snps) {
return $snps["chrom"] === "PAR";
});
foreach ($snps as $snp) {
$rsid = $snp["rsid"];
echo "rsid: $rsid\n";
if (str_starts_with($rsid, "rs")) {
$response = $this->lookupRefsnpSnapshot($rsid, $restClient);
// print_r($response);
if ($response !== null) {
// print_r($response["primary_snapshot_data"]["placements_with_allele"]);
foreach ($response["primary_snapshot_data"]["placements_with_allele"] as $item) {
// print_r($item["seq_id"]);
// var_dump(str_starts_with($item["seq_id"], "NC_000023"));
// var_dump(str_starts_with($item["seq_id"], "NC_000024"));
if (str_starts_with($item["seq_id"], "NC_000023")) {
$assigned = $this->assignSnp($rsid, $item["alleles"], "X");
// var_dump($assigned);
} elseif (str_starts_with($item["seq_id"], "NC_000024")) {
$assigned = $this->assignSnp($rsid, $item["alleles"], "Y");
// var_dump($assigned);
} else {
$assigned = false;
}
if ($assigned) {
if (!$this->_build_detected) {
$this->_build = $this->extractBuild($item);
$this->_build_detected = true;
}
break;
}
}
}
}
}
}
protected function extractBuild($item)
{
$assembly_name = $item["placement_annot"]["seq_id_traits_by_assembly"][0]["assembly_name"];
$assembly_name = explode(".", $assembly_name)[0];
return intval(substr($assembly_name, -2));
}
protected function assignSnp($rsid, $alleles, $chrom)
{
// only assign SNP if positions match (i.e., same build)
foreach ($alleles as $allele) {
$allele_pos = $allele["allele"]["spdi"]["position"];
// ref SNP positions seem to be 0-based...
// print_r($this->get($rsid)["pos"] - 1);
// echo "\n";
// print_r($allele_pos);
if ($allele_pos == $this->get($rsid)["pos"] - 1) {
$this->setValue($rsid, "chrom", $chrom);
return true;
}
}
return false;
}
public function get($rsid)
{
return $this->_snps[$rsid] ?? null;
}
public function setValue($rsid, $key, $value)
{
echo "Setting {$rsid} {$key} to {$value}\n";
$this->_snps[$rsid][$key] = $value;
}
private function lookupRefsnpSnapshot($rsid, $restClient)
{
$id = str_replace("rs", "", $rsid);
$response = $restClient->perform_rest_action("/variation/v0/refsnp/" . $id);
if (isset($response["merged_snapshot_data"])) {
// this RefSnp id was merged into another
// we'll pick the first one to decide which chromosome this PAR will be assigned to
$mergedId = "rs" . $response["merged_snapshot_data"]["merged_into"][0];
error_log("SNP id {$rsid} has been merged into id {$mergedId}"); // replace with your preferred logger
return $this->lookupRefsnpSnapshot($mergedId, $restClient);
} elseif (isset($response["nosnppos_snapshot_data"])) {
error_log("Unable to look up SNP id {$rsid}"); // replace with your preferred logger
return null;
} else {
return $response;
}
}
/**
* Sex derived from SNPs.
*
* @return string 'Male' or 'Female' if detected, else empty string
*/
public function getSex()
{
$sex = $this->determine_sex(chrom: "X");
if (empty($sex))
$sex = $this->determine_sex(chrom: "Y");
return $sex;
}
/**
* Determine sex from SNPs using thresholds.
*
* @param float $heterozygous_x_snps_threshold percentage heterozygous X SNPs; above this threshold, Female is determined
* @param float $y_snps_not_null_threshold percentage Y SNPs that are not null; above this threshold, Male is determined
* @param string $chrom use X or Y chromosome SNPs to determine sex, default is "X"
* @return string 'Male' or 'Female' if detected, else empty string
*/
public function determine_sex(
$heterozygous_x_snps_threshold = 0.03,
$y_snps_not_null_threshold = 0.3,
$chrom = "X"
) {
if (!empty($this->_snps)) {
if ($chrom === "X") {
return $this->_determine_sex_X($heterozygous_x_snps_threshold);
} elseif ($chrom === "Y") {
return $this->_determine_sex_Y($y_snps_not_null_threshold);
}
}
return "";
}
public function _determine_sex_X($threshold)
{
$x_snps = $this->get_count("X");
if ($x_snps > 0) {
if (count($this->heterozygous("X")) / $x_snps > $threshold) {
return "Female";
} else {
return "Male";
}
} else {
return "";
}
}
public function _determine_sex_Y($threshold)
{
$y_snps = $this->get_count("Y");
if ($y_snps > 0) {
if (count($this->notnull("Y")) / $y_snps > $threshold) {
return "Male";

<?php
namespace Dna\Snps\IO;
use Dna\Snps\SNPsResources;
use League\Csv\Info;
use League\Csv\Reader as CsvReader;
use League\Csv\Statement;
use php_user_filter;
use ZipArchive;
stream_filter_register("extra_tabs_filter", ExtraTabsFilter::class) or die("Failed to register filter");
/**
* Class for reading and parsing raw data / genotype files.
*/
class Reader
{
/**
* Initialize a Reader.
*
* @param string $file
* Path to file to load or bytes to load.
* @param bool $_only_detect_source
* Flag to indicate if only detecting the source of the data.
* @param SNPsResources|null $resources
* Instance of Resources.
* @param array $rsids
* rsids to extract if loading a VCF file.
*/
public function __construct(
private string $file,
private bool $_only_detect_source,
private ?SNPsResources $resources,
private array $rsids
) {}
}
/**
* Read and parse a raw data / genotype file.
*
* @return array An array with the following items:
* - 'snps': Array of parsed SNPs.
* - 'source': Detected source of SNPs.
* - 'phased': Flag indicating if SNPs are phased.
* - 'build': Detected build of SNPs.

php-dna/README.md

Lines 1 to 37 in 78a634f

# php-dna
## Requirements
* php-dna 1.0+ requires PHP 8.3 (or later).
## Installation
There are two ways of installing php-dna.
### Composer
To install php-dna in your project using composer, simply add the following require line to your project's `composer.json` file:
{
"require": {
"laravel-liberu/php-dna": "1.0.*"
}
}
### Download and __autoload
If you are not using composer, you can download an archive of the source from GitHub and extract it into your project. You'll need to setup an autoloader for the files, unless you go through the painstaking process if requiring all the needed files one-by-one. Something like the following should suffice:
```php
spl_autoload_register(function ($class) {
$pathToDna = __DIR__ . '/library/'; // TODO FIXME
if (!substr(ltrim($class, '\\'), 0, 7) == 'Dna\\') {
return;
}
$class = str_replace('\\', DIRECTORY_SEPARATOR, $class) . '.php';
if (file_exists($pathToDna . $class)) {
require_once($pathToDna . $class);
}
});

php-dna/phpconvcount.py

Lines 1 to 85 in 78a634f

import re
import ast
pycodefile = '../Projects/geneology/snps/tests/test_snps.py'
phpcodefile = 'tests/Snps/SnpsTest.php'
def normalize_function_name(name):
# Check if the name is already in camelCase with mixed case
if any(c.islower() and name[i+1:i+2].isupper() for i, c in enumerate(name[:-1])):
return name
# Handle snake_case to camelCase conversion
name_parts = name.split('_')
name = name_parts[0] + ''.join(word.strip().capitalize() for word in name_parts[1:])
return name
def get_function_names_in_class(python_code, class_name):
# Parse the Python code using the ast module
parsed_code = ast.parse(python_code)
# Initialize variables to track function names
function_names = []
# Helper function to extract function names from a class node
def extract_function_names(class_node):
names = []
for node in ast.walk(class_node):
if isinstance(node, ast.FunctionDef):
names.append(node.name)
return names
# Traverse the parsed code and extract function names within the specified class
for node in ast.walk(parsed_code):
if isinstance(node, ast.ClassDef) and node.name == class_name:
function_names.extend(extract_function_names(node))
# Return the list of function names
return function_names
# Step 1: Read Python Code from the File
with open(pycodefile, 'r') as python_file:
python_code = python_file.read()
# Step 2: Extract Functions within the TestSnps Class
# Extract function names from the TestSnps class
python_functions = get_function_names_in_class(python_code, "TestSnps")
# Step 3: Normalize Python Function Names
normalized_python_functions = list(set(normalize_function_name(func) for func in python_functions))
# Step 4: Read PHP Code from the File
with open(phpcodefile, 'r') as php_file:
php_code = php_file.read()
# Step 5: Extract PHP Function Names
php_functions = re.findall(r'(public|private|protected) function ([a-zA-Z_][a-zA-Z0-9_]*)\(', php_code)
php_functions = [name for (visibility, name) in php_functions]
# Step 6: Normalize PHP Function Names
normalized_php_functions = [normalize_function_name(func) for func in php_functions]
# Step 7: Compare Python and PHP Function Names
missing_functions = set(normalized_python_functions) - set(normalized_php_functions)
extra_functions = set(normalized_php_functions) - set(normalized_python_functions)
# Count of functions in Python and PHP
python_function_count = len(normalized_python_functions)
php_function_count = len(normalized_php_functions)
# Print the count of functions
print("Number of Functions in Python:", python_function_count)
print("Number of Functions in PHP:", php_function_count)
# print(normalized_python_functions)
# Print missing functions in PHP compared to Python
print("\nMissing Functions in PHP:")
for func in missing_functions:
print(func)
print("\nExtra Functions in PHP:")

I also found the following external resources that might be helpful:

Summaries of links found in the content:

https://raw.githubusercontent.com/apriha/snps/master/src/snps/resources.py:

The page is a Python file named "resources.py" from the "snps" project. It contains a class called "Resources" that is responsible for downloading and loading external resources. The class has various methods for retrieving reference sequences, assembly mapping data, example datasets, and other resources used in the project. The class also has methods for downloading files, managing file paths, and loading data from files. Additionally, there is a class called "ReferenceSequence" that represents and interacts with a reference sequence. The class has properties for accessing the reference sequence's ID, URL, path, assembly, species, taxonomy, sequence, MD5 hash, start position, end position, and length. The code also includes import statements, variable assignments, and a license statement.


Step 2: ⌨️ Coding

  • Modify src/Snps/ReferenceSequence.phpbbf22fe Edit
Modify src/Snps/ReferenceSequence.php with contents:
• Create a new PHP class file named "ReferenceSequence.php" in the "src/Snps/" directory.
• Define a class "ReferenceSequence" with properties for ID, URL, path, assembly, species, taxonomy, sequence, MD5 hash, start position, end position, and length, mirroring the Python version.
• Implement constructor and methods for accessing and manipulating the reference sequence properties.
• Ensure the class can be easily integrated with the "Resources" class for managing reference sequences.
--- 
+++ 
@@ -9,12 +9,12 @@
 {
 
     public function __construct(
-        private readonly string $ID = "",
-        private readonly string $url = "",
-        private readonly string $path = "",
-        private readonly string $assembly = "",
-        private readonly string $species = "",
-        private readonly string $taxonomy = ""
+        private string $ID = "",
+        private string $url = "",
+        private string $path = "",
+        private string $assembly = "",
+        private string $species = "",
+        private string $taxonomy = ""
     ) {
         $this->sequence = [];
         $this->md5 = "";
@@ -37,7 +37,7 @@
      */
     public function getID(): string
     {
-        return $this->_ID;
+        return $this->ID;
     }
 
     /**
@@ -47,7 +47,7 @@
      */
     public function getChrom(): string
     {
-        return $this->_ID;
+        return $this->ID;
     }
 
     /**
@@ -57,7 +57,7 @@
      */
     public function getUrl(): string
     {
-        return $this->_url;
+        return $this->url;
     }
 
     /**
@@ -67,7 +67,7 @@
      */
     public function getPath(): string
     {
-        return $this->_path;
+        return $this->path;
     }
 
     /**
@@ -77,7 +77,7 @@
      */
     public function getAssembly(): string
     {
-        return $this->_assembly;
+        return $this->assembly;
     }
 
     /**
@@ -87,7 +87,7 @@
      */
     public function getBuild(): string
     {
-        return "B" . substr($this->_assembly, -2);
+        return "B" . substr($this->assembly, -2);
     }
     /**
      * Returns the species of the reference sequence.
@@ -96,7 +96,7 @@
      */
     public function getSpecies(): string
     {
-        return $this->_species;
+        return $this->species;
     }
 
     /**
@@ -106,7 +106,7 @@
      */
     public function getTaxonomy(): string
     {
-        return $this->_taxonomy;
+        return $this->taxonomy;
     }
 
     /**
@@ -167,7 +167,7 @@
     {
         if (!count($this->sequence)) {
             // Decompress and read file
-            $data = file_get_contents($this->_path);
+            $data = file_get_contents($this->path);
 
             // check if file is gzipped
             if (str_starts_with($data, "\x1f\x8b")) {
  • Running GitHub Actions for src/Snps/ReferenceSequence.phpEdit
Check src/Snps/ReferenceSequence.php with contents:

Ran GitHub Actions for bbf22feba81623944ade7a52c9056a3025c98711:

Create src/Snps/Resources.php with contents:
• Refactor the existing "Resources.php" file to include methods for downloading and loading external resources, similar to the Python "resources.py" file.
• Implement methods for retrieving reference sequences, assembly mapping data, example datasets, and other resources used in the project.
• Add methods for downloading files, managing file paths, and loading data from files.
• Integrate the "ReferenceSequence" class within the "Resources" class to handle reference sequence data.
• Ensure all methods and properties are properly documented with PHPDoc comments for clarity and maintainability.
• Test the functionality of the "Resources" class to ensure it matches the expected behavior of the Python version.
  • Running GitHub Actions for src/Snps/Resources.phpEdit
Check src/Snps/Resources.php with contents:

Ran GitHub Actions for 4a2b316c17f94b930115383fa889262c62c9e589:


Step 3: 🔁 Code Review

I have finished reviewing the code for completeness. I did not find errors for sweep/_de62a.


🎉 Latest improvements to Sweep:
  • New dashboard launched for real-time tracking of Sweep issues, covering all stages from search to coding.
  • Integration of OpenAI's latest Assistant API for more efficient and reliable code planning and editing, improving speed by 3x.
  • Use the GitHub issues extension for creating Sweep issues directly from your editor.

💡 To recreate the pull request edit the issue title or description. To tweak the pull request, leave a comment on the pull request.Something wrong? Let us know.

This is an automated message generated by Sweep AI.

from php-dna.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.