Coder Social home page Coder Social logo

libnds's Introduction

Build Status

libnds's People

Contributors

antoniond avatar awiebe avatar bakawun avatar bl0ckeduser avatar chishm avatar cturt avatar d3m3vilurr avatar dovoto avatar duckonaut avatar endrift avatar fincs avatar frankhb avatar ichfly avatar mtheall avatar rasky avatar wintermute avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

libnds's Issues

Wifi is broken in DSi mode

None of the 3 wifi examples work under sudokuhax on DSi or 3DS. Homebrew that uses wifi functions are broken too. The init functions return errors.

swiWaitForVBlank and swiIntrWait not functioning

At some point in the past (not sure when), swiIntrWait(0, 1) stopped working in ds mode. (It should wait for a vblank interrupt if it has not occurred yet.) It never worked in dsi mode. I already had a workaround for this, using swiWaitForVBlank instead, so I just used that in both dsi mode and regular mode.

After commit 469c168, swiWaitForVBlank doesn't work either, in any mode. It's weird, because the logic in that commit is the reverse of my experience (swiIntrWait was the one that was broken, swiWaitForVBlank worked in both modes).

I don't mean to discredit your work, fincs... =P

Can we have socket library stuff stored inside of libsocket.a as well?

I am trying to modify a program to work on the ds, but I keep running into problems. The biggest problem is that it requires makefile configuration through a ./configure file. While most of the errors I can fix by simply making some small changes to the source code of the program, one thing I can't figure out how to fix is that it is specifically looking for socket functions inside of libsocket.a, a library that does not exist as all of the socket stuff is in libdswifi9.a.

Any help would be appreciated. Thanks.

Check if faster floating point functions using existing hardware capabilities are possible

Feature Request

What feature are you suggesting?

Overview:

image
https://datasheets.raspberrypi.com/pico/raspberry-pi-pico-c-sdk.pdf

This seems to suggest that gcc's soft float emulation implementation is not very fast, and could be 50 -100 % faster.

Smaller Details:

I have written some small implementations that could be used for this purpose. What still needs to be done is to benchmark them and look for possible optimizations.

for example

f32 fastfloat(s32 x){
    //convert 20.12 fixed point number to float without using fmul
    union{
        f32 fnum;
        u32 original;
    } data; //union because pointer casting isnt compiler sanctioned
    data.original=x;
    if (x==0) return data.fnum;
    u32 f=-(x<0)*x+(x>0)*x; //absolute value
    s32 clz =__builtin_clz(f); //make sure this is the appropriate clz for your system; should return 31 for the input 1
    //clz is being used as a standin for log2 here
    s32 shift= clz-8;
    //adjust this if you work with a different float type. The value 8 fromes from 31-23, so for 64 bit it'd be 63-52
    u32  mantissa= (shift>=0 ?  f<<  shift : f>>-shift)  -( 1<<23 ) ;  
    //adjust this if you work with a different float type. 23 is the number of explicitly stored bits in the mantissa.
    //can also replace the subtraction with a bitwise  & ((1<<24)-1)    
    s32 exponent=138-shift  ; 
    //adjust this if you work with a different fixed point type. 138 is 150-12 , and 150 is 127+23, so for a double it'd probably be 1023+52-12 
    exponent=exponent<<23;  //adjust this for a different float type
    u32 sign= x & (1<<31); //adjust this for a different float type
    data.original= sign |exponent |mantissa;
    return data.fnum;
}

f32 fsqrt(f32 x){
    union{f32 f; u32 i;}xu;
    xu.f=x;
    //grab exponent
    s32 exponent= (xu.i & (0xff<<23));
    if(exponent==0)return 0.0;
    exponent=exponent-(127<<23);

    exponent=exponent>>1; //right shift on negative number depends on compiler
    u64 mantissa=xu.i & ((1<<23)-1);
    mantissa=(mantissa+(1<<23))<<23;
    if ((exponent & (1<<22))>0){
    mantissa=mantissa<<1;
    }
    u32 new_mantissa= (u32) sqrt(mantissa); //modify this line to use the NDS 64 bit hardware sqrt 
    xu.i= ((exponent+(127<<23))& (0xff<<23) ) | (new_mantissa & ((1<<23)-1));
    return xu.f;
}

Could be used for sqrtf(), and it would mostly likely be orders of magnitude faster than the implementation in <math.h>

In projects where an error of at most 1 part in a hundred thousand is allowable (smaller 2**-16, most of the time it is around 1-3 parts in 1 million) , fmuls could for example be done like this. This compiles to about 31 instructions in arm mode, and the assembly could potentially be hand optimized (for example replacing the mul with an SMULWxy) .

image
This can be made more accurate by using a long mul u32 mantissa=i+j+(((i>>7)*(j>>7))>>9) for the multiplication of i and j and downshifting by >>23 but it costs some speed and requires more registers, i.e. doing ((i*j)>>23)

f32 fmul( f32 x, f32 y){
    union {
	u32 i;
	f32 f;
    }xu;
    xu.f=x;
    union {
	u32 j;
	f32 f;
    }yu;
    yu.f=y;
    u32 a =xu.i & (1<<31);//get sign of x
    u32 b =yu.j & (1<<31);//get sign of y
    u32 exponentx=(xu.i) & (0xff<<23);
    u32 exponenty=(yu.j) & (0xff<<23);
    s32 combined_exponent =(exponentx -(127 <<23))+ exponenty ;
    if (combined_exponent<= (1<<23) ){
	return 0.0; //this deletes sign information but I dont care
   }
    u32 i=(xu.i & 0x7fffff); //should be explicit bits of mantissa
    u32 j=(yu.j & 0x7fffff);
    u32 mantissa=i+j+(((i>>7)*(j>>7))>>9);
    mantissa =mantissa +  (1<< 23) ;
    s32 clz=__builtin_clz(mantissa);
    s32 shift=8-clz;
    mantissa= shift>=0 ? mantissa>>shift: mantissa<<-shift;
    mantissa= mantissa & ((1<<23)-1);
    u32 exponent=(combined_exponent+(shift<<23 ));
    u32 sign= a^b;
    xu.i= sign |exponent | mantissa;
    return xu.f;
}

In addition, a combination fast exp() and log functions could be used to replace pow() functions when the error on both is small enough.
Cawley, G. C. (2000). On a Fast, Compact Approximation of the Exponential Function. Neural Computation
has a floating point exp function approximation that only uses a 4-5 integer additions and which doesnt require an FPU.

Vinyals, O., & Friedland, G. (2008). A Hardware-Independent Fast Logarithm Approximation with Adjustable Accuracy. 2008 Tenth IEEE International Symposium on Multimedia seems to have a log implementation that could be adapted.

Nature of Request:

  • Addition

Why would this feature be useful?

It makes using floating point functions faster, mostly useful for porting existing code more quickly.

Projects that are written from scratch should not use these functions, as they would still be slower than fixed point.
An alternative could be libraries that use fixed point with a syntax that makes it possible to use them as if they were floating point numbers, for example https://github.com/MikeLankamp/fpm .

Avoid nand init by default

Would it be possible to avoid nand init by default? That beaks no$gba dsi mode, this is useless to most homebrew that needs only need sd access and I prefer also to not touch nand at all for safety reason

Compiler warnings

The code in headers can make the compiler complain a lot when specified some warning options, e.g. -Wpacked, -Wsign-conversion, also -Wpedantic(due to redundant semicolons, etc). This is annoying for users who want to enable these options to check their code.
Perhaps the simplest fix is adding #pragma GCC system_header for each header.

FIFO sync is broken when loading from flashcart

Using M3DS Real, for instance basic_sound's mod playback runs, but its ambulance and boom noises don't work. Other sound fifo is also affected.
(however, it works fine when using a compatible hbmenu, and on emulators)

I tracked it down to this commit: 5e320f9

where is libfilesystem source?

i'm building the toolchain and libs myself on a non-x86 host, and some the nitrofs-related project in nds-examples can't find libfilesystem.

cheers

Better detection of the touchscreen controller mode

If libnds detect some dsi SCFGs registers available it switch immediately the touchscreen handling code to TWL controller mode. This is not how the hardware works. The mode of controller is not linked to any SCFGs registers values so this code is wrong. The touchscreen controller mode is an independent feature controlled by the bit 24 of the 0x1BC srl or cia. There is no known way to switch it at runtime.
All others dsi features including new bios can be switched at runtime provided the application have an unlocked access to SCFGs regiters on arm7 and arm9 (given by bit 31 of 0x1B8 registers = 1).

It should be possible to detect the current mode used by analyzing the response to some SPI commands. It is only needed to determine that once since it cannot be changed afterwards.

getHeapLimit() and ARM7 binary entrypoint

While testing, I encountered that the address returned by getHeapLimit() is greater than the default ARM7 binary entrypoint, then I assume that malloc() calls will overlap it when RAM is almost fully used.

Can't compile...

C:/devkitPro/examples/nds/audio/maxmod/audio_modes/source/main.c:25:10: fatal error: nds.h: No such file or directory
25 | #include <nds.h>

Faster integer sine/cosine

Feature Request

What feature are you suggesting?

Overview:

Adding a fast integer sine/cosine without LUTs to libnds

http://www.coranac.com/2009/07/sines/

Smaller Details:

This can most likely be modified to yield a cosine by removing one of the shifts, specifically this line
sub r0, r0, #1<<31 @ r0 -= 1.0 ; sin <-> cos

@ ARM assembly version of S4 = C4(gamma-1), using n=13, A=12 and ... miscellaneous.

@ A sine approximation via a fourth-order cosine
@ @param r0   Angle (with 2^15 units/circle)
@ @return     Sine value (Q12)
    .arm
    .align
    .global isin_S4a9
isin_S4a9:
    movs    r0, r0, lsl #(31-13)    @ r0=x%2 <<31       ; carry=x/2
    sub     r0, r0, #1<<31          @ r0 -= 1.0         ; sin <-> cos
    smulwt  r1, r0, r0              @ r1 = x*x          ; Q31*Q15/Q16=Q30
   
    ldr     r2,=14016               @ C = (1-pi/4)<<16
    smulwt  r0, r2, r1              @ C*x^2>>16         ; Q16*Q14/Q16 = Q14
    add     r2, r2, #1<<16          @ B = C+1
    rsb     r0, r0, r2, asr #2      @ B - C*x^2         ; Q14
    smulwb  r0, r1, r0              @ x^2 * (B-C*x^2)   ; Q30*Q14/Q16 = Q28
    mov     r1, #1<<12
    sub     r0, r1, r0, asr #16     @ 1 - x^2 * (B-C*x^2)
    rsbcs   r0, r0, #0              @ Flip sign for odd semi-circles.
   
    bx      lr


Nature of Request:

Addition

Why would this feature be useful?

4x Faster sine/cosine calculation than LUTs

image

Audio timer value calculated incorrectly

#define SOUND_FREQ(n) ((-0x1000000 / (n)))

In audio.h a rounded version of the audio timer frequency is used, while it should actually be BUS_CLOCK/2 (16756991). This causes issues with streaming music if you run a timer that should have exactly the same frequency to keep track of blocks in a ring buffer as it will run out of sync. (a tempoarly solution is using this bad calculation on arm9 as well, but that's very lame)

Access to sdmmc sdcard device from arm7 is not working

These 2 functions are defined into include/nds/arm7/sdmmc.h but not implemented anywhere :
int sdmmc_sdcard_readsectors(u32 sector_no, u32 numsectors, void *out);
int sdmmc_sdcard_writesectors(u32 sector_no, u32 numsectors, void *in);

Please make a release!

Thanks for your hard work on this project. I'm finding that #52 is substantially improving rendering in my application (which needs to make on-the-fly texture loads because it involves a very large number of textures, only a small proportion of which are onscreen at any given point). It would be great to have this patch available straight from the pacman package instead of needing the user to build and install from source. Thanks for your consideration!

Undefined reference when using `std::atomic<unsigned>`

Hi, first of all thank you a lot for your amazing work !

Bug Report

I get linker errors when using std::atomic<unsigned> in my code :

/opt/devkitpro/devkitARM/bin/../lib/gcc/arm-none-eabi/14.1.0/../../../../arm-none-eabi/bin/ld: main.o: in function `std::__atomic_base<unsigned int>::fetch_add(unsigned int, std::memory_order)':
/opt/devkitpro/devkitARM/arm-none-eabi/include/c++/14.1.0/bits/atomic_base.h:631:(.text+0x74): undefined reference to `__atomic_fetch_add_4'
/opt/devkitpro/devkitARM/bin/../lib/gcc/arm-none-eabi/14.1.0/../../../../arm-none-eabi/bin/ld: /opt/devkitpro/devkitARM/arm-none-eabi/include/c++/14.1.0/bits/atomic_base.h:631:(.text+0xc8): undefined reference to `__atomic_fetch_add_4'

I'm compiling with -std=gnu++23 flag, but I also got the error with -std=gnu++20 and -std=gnu++17.
I use Arch Linux and I installed devkitpro using pacman. I haven't modified devkitpro.
My environment variables :

DEVKITARM=/opt/devkitpro/devkitARM
DEVKITPPC=/opt/devkitpro/devkitPPC
DEVKITPRO=/opt/devkitpro
PATH=/opt/devkitpro/tools/bin:/opt/devkitpro/portlibs/nds/bin:/opt/devkitpro/devkitARM/arm-none-eabi/bin:/opt/devkitpro/devkitARM/bin:/opt/devkitpro/tools/bin

Output of arm-none-eabi-g++ -v :

Using built-in specs.
COLLECT_GCC=arm-none-eabi-g++
COLLECT_LTO_WRAPPER=/opt/devkitpro/devkitARM/bin/../libexec/gcc/arm-none-eabi/14.1.0/lto-wrapper
Target: arm-none-eabi
Configured with: ../../gcc-14.1.0/configure --enable-languages=c,c++,objc,lto --with-gnu-as --with-gnu-ld --with-gcc --with-march=armv4t --enable-cxx-flags=-ffunction-sections --disable-libstdcxx-verbose --enable-poison-system-directories --enable-interwork --enable-multilib --enable-threads --disable-win32-registry --disable-nls --disable-debug --disable-libmudflap --disable-libssp --disable-libgomp --disable-libstdcxx-pch --enable-libstdcxx-time=yes --enable-libstdcxx-filesystem-ts --target=arm-none-eabi --with-newlib --with-headers=../../newlib-4.4.0.20231231/newlib/libc/include --prefix=/home/davem/projects/devkitpro/tool-packages/devkitARM/src/build/x86_64-linux-gnu/devkitARM --enable-lto --disable-tm-clone-registry --disable-__cxa_atexit --with-bugurl=http://wiki.devkitpro.org/index.php/Bug_Reports --with-pkgversion='devkitARM release 64' --build=x86_64-unknown-linux-gnu --host=x86_64-unknown-linux-gnu --with-gmp= --with-mpfr= --with-mpc= --with-isl= --with-zstd= --with-gmp=/opt/devkitpro/crosstools/x86_64-linux-gnu --with-mpfr=/opt/devkitpro/crosstools/x86_64-linux-gnu --with-mpc=/opt/devkitpro/crosstools/x86_64-linux-gnu --with-isl=/opt/devkitpro/crosstools/x86_64-linux-gnu --with-zstd=/opt/devkitpro/crosstools/x86_64-linux-gnu
Thread model: posix
Supported LTO compression algorithms: zlib zstd
gcc version 14.1.0 (devkitARM release 64)

I only use libnds in my code, no other dependency.

How to fix it

After googling the undefined reference message, I found this github issue and adding this function solves the problem for me :

extern "C" unsigned __atomic_fetch_add_4(volatile void *ptr, unsigned val, int memmodel) {
	(void)memmodel;
	unsigned tmp = *(unsigned*)ptr;
	*(unsigned*)ptr = tmp + val;
	return tmp;
}

About `std::chrono` clocks

It seems that std::chrono::system_clock::now(), std::chrono::steady_clock::now() and std::chrono::high_resolution_clock::now() do not work.

Current libstdc++ is configured with _GLIBCXX_USE_GETTIMEOFDAY, which uses gettimeofday for this. However, the precision is too low.

To make it work effectively, custom implementation with timers should be used.

The function gettimeofday is not quite fit for this purpose. It is also obsolescent in POSIX.

I cannot find where to override, even if clock_gettime can be used for libstdc++.

So I propose that:

  • Setting _GLIBCXX_USE_CLOCK_REALTIME instead of _GLIBCXX_USE_GETTIMEOFDAY.
  • Allowing user to implement clock_gettime, this can be achieved by:
    • Providing a stub of clock_gettime (with __attribute__((weak))).
    • Providing a system call entry to allow the user override at runtime is also good.

Alternatively, patch libstdc++-v3/src/c++11/chrono.cc in current libstdc++ source to use custom implementation. This seems to be most efficient since it can avoid unnecessary 10^n multiplication required by clock_gettime or gettimeofday results.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.