Coder Social home page Coder Social logo

Evolutions of describe about questionr HOT 7 CLOSED

larmarange avatar larmarange commented on July 27, 2024
Evolutions of describe

from questionr.

Comments (7)

larmarange avatar larmarange commented on July 27, 2024

If I remember correctly, the orginal describe was developped by @briatte

from questionr.

briatte avatar briatte commented on July 27, 2024

I like that idea too, but I cannot remember much of the original describe function I wrote, which was basically a port from Stata.

If the idea is to make the output useful for data exploration, I would suggest adding "obs." next to vector length, and show the percentage of missing values. I would also suggest, like memisc, "translating" factor into "Nominal", ord. factor into "Ordinal", and numeric/integer into "Numeric".

> describe(d$factor_var)
[2000 obs.] Categorie socio-professionnelle
Nominal: "Employe"    NA           "Technicien" "Technicien" "Employe"  ...
7 levels: Ouvrier specialise | Ouvrier qualifie | Technicien | Profession intermediaire 
| Cadre | Employe | Autre
NAs: 347 (17.4%)

I would even go as far as to suggest, through an argument like help = TRUE:

  • the most appropriate method to view stats, e.g. freq(x)
  • the most appropriate method to plot, e.g. plot(table(x))

from questionr.

larmarange avatar larmarange commented on July 27, 2024

You can test this commit larmarange@b681ccf

If it's OK, I will add it to a pull request.

I kept the original possibility of providing a list of variables in case of data.frame. It's working with data.frame, data_frame and data.table.

Some examples:

> describe(hdv2003$age)
[2000 obs.] 
integer: 28 23 59 34 71 ...
min: 18 - max: 97 - NAs: 0 (0%) - 78 unique values
> describe(hdv2003$age)
[2000 obs.] 
integer: 28 23 59 34 71 ...
min: 18 - max: 97 - NAs: 0 (0%) - 78 unique values
> describe(hdv2003)
[2000 obs. x 20 variables] data.frame

$id: 
integer: 1 2 3 4 5 ...
min: 1 - max: 2000 - NAs: 0 (0%) - 2000 unique values

$age: 
integer: 28 23 59 34 71 ...
min: 18 - max: 97 - NAs: 0 (0%) - 78 unique values

$sexe: 
nominal factor: "Femme" "Femme" "Homme" "Homme" "Femme" ...
2 levels: Homme | Femme
NAs: 0 (0%)

$nivetud: 
nominal factor: "Enseignement superieur y compris technique superieur" NA "Derniere annee d'etudes primaires" "Enseignement superieur y compris technique superieur" "Derniere annee d'etudes primaires" ...
8 levels: N'a jamais fait d'etudes | A arrete ses etudes, avant la derniere annee d'etudes primaires | Derniere annee d'etudes primaires | 1er cycle | 2eme cycle | Enseignement technique ou professionnel court | Enseignement technique ou professionnel long | Enseignement superieur y compris technique superieur
NAs: 112 (0.1%)

$poids: 
numeric: 2634.3982157 9738.3957759 3994.1024587 5731.6615081 4329.0940022 ...
min: 78.0783403 - max: 31092.14132 - NAs: 0 (0%) - 1877 unique values

$occup: 
nominal factor: "Exerce une profession" "Etudiant, eleve" "Exerce une profession" "Exerce une profession" "Retraite" ...
7 levels: Exerce une profession | Chomeur | Etudiant, eleve | Retraite | Retire des affaires | Au foyer | Autre inactif
NAs: 0 (0%)

$qualif: 
nominal factor: "Employe" NA "Technicien" "Technicien" "Employe" ...
7 levels: Ouvrier specialise | Ouvrier qualifie | Technicien | Profession intermediaire | Cadre | Employe | Autre
NAs: 347 (0.2%)

$freres.soeurs: 
integer: 8 2 2 1 0 ...
min: 0 - max: 22 - NAs: 0 (0%) - 19 unique values

$clso: 
nominal factor: "Oui" "Oui" "Non" "Non" "Oui" ...
3 levels: Oui | Non | Ne sait pas
NAs: 0 (0%)

$relig: 
nominal factor: "Ni croyance ni appartenance" "Ni croyance ni appartenance" "Ni croyance ni appartenance" "Appartenance sans pratique" "Pratiquant regulier" ...
6 levels: Pratiquant regulier | Pratiquant occasionnel | Appartenance sans pratique | Ni croyance ni appartenance | Rejet | NSP ou NVPR
NAs: 0 (0%)

$trav.imp: 
nominal factor: "Peu important" NA "Aussi important que le reste" "Moins important que le reste" NA ...
4 levels: Le plus important | Aussi important que le reste | Moins important que le reste | Peu important
NAs: 952 (0.5%)

$trav.satisf: 
nominal factor: "Insatisfaction" NA "Equilibre" "Satisfaction" NA ...
3 levels: Satisfaction | Insatisfaction | Equilibre
NAs: 952 (0.5%)

$hard.rock: 
nominal factor: "Non" "Non" "Non" "Non" "Non" ...
2 levels: Non | Oui
NAs: 0 (0%)

$lecture.bd: 
nominal factor: "Non" "Non" "Non" "Non" "Non" ...
2 levels: Non | Oui
NAs: 0 (0%)

$peche.chasse: 
nominal factor: "Non" "Non" "Non" "Non" "Non" ...
2 levels: Non | Oui
NAs: 0 (0%)

$cuisine: 
nominal factor: "Oui" "Non" "Non" "Oui" "Non" ...
2 levels: Non | Oui
NAs: 0 (0%)

$bricol: 
nominal factor: "Non" "Non" "Non" "Oui" "Non" ...
2 levels: Non | Oui
NAs: 0 (0%)

$cinema: 
nominal factor: "Non" "Oui" "Non" "Oui" "Non" ...
2 levels: Non | Oui
NAs: 0 (0%)

$sport: 
nominal factor: "Non" "Oui" "Oui" "Oui" "Non" ...
2 levels: Non | Oui
NAs: 0 (0%)

$heures.tv: 
numeric: 0 1 0 2 3 ...
min: 0 - max: 12 - NAs: 5 (0%) - 30 unique values
> describe(hdv2003, "cuisine", "heures.tv")
[2000 obs. x 2 variables] data.frame

$cuisine: 
nominal factor: "Oui" "Non" "Non" "Oui" "Non" ...
2 levels: Non | Oui
NAs: 0 (0%)

$heures.tv: 
numeric: 0 1 0 2 3 ...
min: 0 - max: 12 - NAs: 5 (0%) - 30 unique values
> describe(hdv2003, "trav*")
[2000 obs. x 2 variables] data.frame

$trav.imp: 
nominal factor: "Peu important" NA "Aussi important que le reste" "Moins important que le reste" NA ...
4 levels: Le plus important | Aussi important que le reste | Moins important que le reste | Peu important
NAs: 952 (0.5%)

$trav.satisf: 
nominal factor: "Insatisfaction" NA "Equilibre" "Satisfaction" NA ...
3 levels: Satisfaction | Insatisfaction | Equilibre
NAs: 952 (0.5%)
> describe(hdv2003, "trav|lecture")
[2000 obs. x 3 variables] data.frame

$trav.imp: 
nominal factor: "Peu important" NA "Aussi important que le reste" "Moins important que le reste" NA ...
4 levels: Le plus important | Aussi important que le reste | Moins important que le reste | Peu important
NAs: 952 (0.5%)

$trav.satisf: 
nominal factor: "Insatisfaction" NA "Equilibre" "Satisfaction" NA ...
3 levels: Satisfaction | Insatisfaction | Equilibre
NAs: 952 (0.5%)

$lecture.bd: 
nominal factor: "Non" "Non" "Non" "Non" "Non" ...
2 levels: Non | Oui
NAs: 0 (0%)

> describe(femmes)
[2000 obs. x 17 variables] tbl_df tbl data.frame

$id_femme: Identifiant de l'enquêtée
integer: 391 1643 85 881 1981 ...
min: 1 - max: 2000 - NAs: 0 (0%) - 2000 unique values

$id_menage: Identifiant du ménage
integer: 381 1515 85 844 1797 ...
min: 1 - max: 1814 - NAs: 0 (0%) - 1814 unique values

$poids: Poids statistique
numeric: 1.80315 1.80315 1.80315 1.80315 1.80315 ...
min: 0.044629 - max: 4.396831 - NAs: 0 (0%) - 351 unique values

$date_entretien: Date de passation du questionnaire
Date: 2012-05-05 2012-01-23 2012-01-21 2012-01-06 2012-05-11 ...
min: 2011-12-01 - max: 2012-05-31 - NAs: 0 (0%) - 165 unique values

$date_naissance: Date de naissance
Date: 1997-03-07 1982-01-06 1979-01-01 1968-03-29 1986-05-25 ...
min: 1962-02-07 - max: 1997-03-13 - NAs: 0 (0%) - 1740 unique values

$age: Âge révolu (en années) à la date de passation du questionnaire
numeric: 15 30 33 43 25 ...
min: 14 - max: 49 - NAs: 0 (0%) - 36 unique values

$milieu: Milieu de résidence
labelled numeric: 2 2 2 2 2 ...
2 labels: [1] urbain [2] rural
min: 1 - max: 2 - NAs: 0 (0%) - 2 unique values

$region: Région de résidence
labelled numeric: 4 4 4 4 4 ...
4 labels: [1] Nord [2] Est [3] Sud [4] Ouest
min: 1 - max: 4 - NAs: 0 (0%) - 4 unique values

$educ: Niveau d'éducation
labelled numeric: 0 0 0 0 1 ...
4 labels: [0] aucun [1] primaire [2] secondaire [3] supérieur
min: 0 - max: 3 - NAs: 0 (0%) - 4 unique values

$travail: A un emploi ?
labelled numeric: 1 1 0 1 1 ...
2 labels: [0] non [1] oui
min: 0 - max: 9 - NAs: 0 (0%) - 3 unique values

$matri: Statut matrimonial
labelled numeric: 0 2 2 2 1 ...
6 labels: [0] célibataire [1] mariée [2] en concubinage [3] veuve [4] divorcée [5] séparée
min: 0 - max: 5 - NAs: 0 (0%) - 6 unique values

$religion: Religion
labelled numeric: 1 3 2 3 2 ...
5 labels: [1] musulmane [2] chrétienne [3] protestante [4] sans religion [5] autre
min: 1 - max: 5 - NAs: 4 (0%) - 6 unique values

$journal: Lit la presse ?
labelled numeric: 0 0 0 0 0 ...
2 labels: [0] non [1] oui
min: 0 - max: 1 - NAs: 0 (0%) - 2 unique values

$radio: Ecoute la radio ?
labelled numeric: 0 1 1 0 0 ...
2 labels: [0] non [1] oui
min: 0 - max: 1 - NAs: 0 (0%) - 2 unique values

$tv: Regarde la télévision ?
labelled numeric: 0 0 0 0 0 ...
2 labels: [0] non [1] oui
min: 0 - max: 1 - NAs: 0 (0%) - 2 unique values

$nb_enf_ideal: Nombre idéal d'enfants
labelled numeric: 4 4 4 4 4 ...
1 labels: [96] Ne sait pas
min: 0 - max: 99 - NAs: 0 (0%) - 18 unique values

$test: A déjà fait un test de dépistage du VIH ?
labelled numeric: 0 9 0 0 1 ...
2 labels: [0] non [1] oui
min: 0 - max: 9 - NAs: 0 (0%) - 3 unique values

from questionr.

briatte avatar briatte commented on July 27, 2024

Looks pretty good to me, very helpful output that immediately shows things that would require two or three functions to get in base R.

from questionr.

larmarange avatar larmarange commented on July 27, 2024

I have prepared a new pull request with labelled functions, freq, lookfor, describe and ltabs

#54

from questionr.

larmarange avatar larmarange commented on July 27, 2024

cf. Pull Request #57

from questionr.

larmarange avatar larmarange commented on July 27, 2024

cf. #72

from questionr.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.