Coder Social home page Coder Social logo

lexborisov / modest Goto Github PK

View Code? Open in Web Editor NEW
733.0 40.0 65.0 6.42 MB

Modest is a fast HTML renderer implemented as a pure C99 library with no outside dependencies.

License: GNU Lesser General Public License v2.1

Makefile 0.24% C 96.52% HTML 0.54% Perl 2.53% CMake 0.16%
html-parser css-parser html-renderer c css html css-selector pure-c

modest's Issues

wrong dll name

dll name should be libmodest-0.dll and not libmodest.0.dll

myfont/myfont.c

myfont.c makes use of fopen instead of myport/mycore/io.c abstraction layer.

adding unit tests

Unit tests are essential to verify the code and check for regressions

Use the 'check' unit testing library, or using this simple code
suite2.zip for example

Future of Modest

Hi There,
Thanks for creating Modest and lexbor

It seems that you now have a new project lexbor that aims to do what Modest is doing. So with that in mind, what's the future of this repository ?

If this repository is deprecated ( since the last commit was almost over 2 years ago ) why not archive this project and point to lexbor ?

I originally came from rushter/selectolax#82 .. If we deprecate this repo, we would have a stronger argument to remove modest from selectolax

Report some bugs

Hi @lexborisov
I think, may mycss be have some bugs.
I use css_low_level.c to parse and serialize some css string. after serialization, I find 2 problems.

  1. css string missed some content like:

@font-face {font-family: "iconfont"; src: url('/v2/iconfont/iconfont.eot'); /* IE9*/ src: url('/v2/iconfont/iconfont.eot?#iefix') format('embedded-opentype'), /* IE6-IE8 */ url('/v2/iconfont/iconfont.woff') format('woff'), /* chrome、firefox */ url('/v2/iconfont/iconfont.ttf') format('truetype'), /* chrome、firefox、opera、Safari, Android, iOS 4.2+*/ url('/v2/iconfont/iconfont.svg#iconfont') format('svg'); /* iOS 4.1- */ }

  1. some characters can not serialize like:
    .icon-my:before { content: "\e686"; } .icon-cascades:before { content: "\e67c"; } .icon-share:before { content: "\e6f3"; } .icon-time:before { content: "\e65f"; } .icon-ttpodicon:before { content: "\e667"; } .icon-shujulaiyuan:before { content: "\e7d4"; }
    changed to
    .icon-more::before {content: ;} .icon-myfill::before {content: ;} .icon-my::before {content: ;} .icon-cascades::before {content: ;} .icon-share::before {content: ;} .icon-time::before {content: ;} .icon-ttpodicon::before {content: ;} .icon-shujulaiyuan::before {content: ;}

the source css is from m.sh.bendibao.com/mip/130913.html inside the mip-custom tag.

longjmp/setjmp in myhtml/serialization.c

void myhtml_serialization_reallocate(mycore_string_raw_t *str, size_t size) makes use of longjmp/setjmp. It can lead to undefined behavior with C++ like leaked resources.

Нельзя использовать Modest с C++ из-за typedef до реального объявления enum :(

external/Modest/include/mycss/values/values.h:37:14: error: use of enum ‘mycss_values_text_decoration_line’ without previous declaration
 typedef enum mycss_values_text_decoration_line mycss_values_text_decoration_line_t;
              ^
external/Modest/include/mycss/values/values.h:37:83: error: invalid type in declaration before ‘;’ token
 typedef enum mycss_values_text_decoration_line mycss_values_text_decoration_line_t;
                                                                                   ^
external/Modest/include/mycss/values/values.h:38:14: error: use of enum ‘mycss_values_text_decoration_skip’ without previous declaration
 typedef enum mycss_values_text_decoration_skip mycss_values_text_decoration_skip_t;
              ^
external/Modest/include/mycss/values/values.h:38:83: error: invalid type in declaration before ‘;’ token
 typedef enum mycss_values_text_decoration_skip mycss_values_text_decoration_skip_t;
                                                                                   ^
external/Modest/include/mycss/values/values.h:41:14: error: use of enum ‘mycss_values_color_type’ without previous declaration
 typedef enum mycss_values_color_type mycss_values_color_type_t;
              ^
external/Modest/include/mycss/values/values.h:41:63: error: invalid type in declaration before ‘;’ token
 typedef enum mycss_values_color_type mycss_values_color_type_t;
                                                               ^
external/Modest/include/mycss/values/values.h:42:14: error: use of enum ‘mycss_values_color_type_value’ without previous declaration
 typedef enum mycss_values_color_type_value mycss_values_color_type_value_t;
              ^
external/Modest/include/mycss/values/values.h:42:75: error: invalid type in declaration before ‘;’ token
 typedef enum mycss_values_color_type_value mycss_values_color_type_value_t;
                                                                           ^
external/Modest/include/mycss/values/values.h:52:14: error: use of enum ‘mycss_values_font_family_type’ without previous declaration
 typedef enum mycss_values_font_family_type mycss_values_font_family_type_t;
              ^
external/Modest/include/mycss/values/values.h:52:75: error: invalid type in declaration before ‘;’ token
 typedef enum mycss_values_font_family_type mycss_values_font_family_type_t;

В C++ не работает вот такое:

typedef enum a b;
enum a {A = 1};
azq2@zhumarin:~/проекты/symfony$ gcc  test.c
azq2@zhumarin:~/проекты/symfony$ g++ test.c
test.c:1:14: error: use of enum ‘a’ without previous declaration
 typedef enum a b;
              ^
test.c:1:17: error: invalid type in declaration before ‘;’ token
 typedef enum a b;
                 ^

Improved OS abstraction

Hi Lexbor,

It would be nice if all the OS abstraction was made with split files per operating system instead of relying on #ifdef statements which can be cumbersome to maintain or expand.

os/windows
os/linux
os/macos
os/bsd
os/pickyourpoison

I see four obvious abstraction targets : memory, threading, io, and timer.

MODEST_PORT_NAME not used in source code

Hello

MODEST_PORT_NAME is passed to the preprocessor : in Makefile:

MODEST_CFLAGS += -DMODEST_PORT_NAME=$(MODEST_PORT_NAME)

but this macro is not used in any source file. Is it really useful to pass it to the preprocessor ?

selectors work only for html node?

in example:
if i change some parts to this:

diff --git a/examples/selectors/selectors_low_level.c b/examples/selectors/selectors_low_level.c
index 084abc8..db10f1d 100644
--- a/examples/selectors/selectors_low_level.c
+++ b/examples/selectors/selectors_low_level.c
@@ -71,8 +71,8 @@ mycss_entry_t * create_css_parser(void)
 
 int main(int argc, const char * argv[])
 {
-    const char *html = "<div><p id=p1><p id=p2><p id=p3><a>link</a><p id=p4><p id=p5><p id=p6></div>";
-    const char *selector = "div > :nth-child(2n+1):not(:has(a))";
+    const char *html = "<div><p id=p1><p id=p2 class=jo><p id=p3><a>link</a><span id=bla><p id=p4 class=jo><p id=p5 class=bu><p id=p6 class=jo></span></div>";
+    const char *selector = ".jo";
     
     myhtml_tree_t *html_tree = parse_html(html, strlen(html));
     mycss_entry_t *css_entry = create_css_parser();
@@ -82,7 +82,11 @@ int main(int argc, const char * argv[])
     
     mycss_selectors_list_t *list = mycss_selectors_parse(mycss_entry_selectors(css_entry), MyHTML_ENCODING_UTF_8, selector, strlen(selector), &out_status);
     
-    myhtml_collection_t *collection = modest_finder_by_selectors_list(finder, list, html_tree->node_html, NULL);
+    myhtml_tag_index_t *tag_index = myhtml_tree_get_tag_index(html_tree);
+    myhtml_tag_index_node_t *index_node = myhtml_tag_index_first(tag_index, MyHTML_TAG_SPAN);
+    myhtml_tree_node_t *span = myhtml_tag_index_tree_node(index_node);
+
+    myhtml_collection_t *collection = modest_finder_by_selectors_list(finder, list, span, NULL);

result is empty, is this expected or bug?

make clean fails

make clean uses 'rm -f' to remove subdirectories in bin/ , while 'rm -rf' should be used for them

Handling of malformed iframe tags

I've noticed a pretty annoying problem on some websites (I think there are at least a thousand of them in Alexa 1M).

An unclosed Iframe tag breaks all the HTML below it.

Here is an example:

<noscript>
    <iframe
            height="0" width="0" data-src="https://www.googletagmanager.com/ns.html?id=GTM-M5RK4MW" class="lazyload"
            src="data:image/gif;base64,R0lGODlhAQABAAAAACH5BAEKAAEALAAAAAABAAEAAAICTAEAOw==">
        <noscript>
            <iframe src="https://www.googletagmanager.com/ns.html?id=GTM-M5RK4MW"
                    height="0" width="0">
        </noscript>
    </iframe>
</noscript>

It's missing the closing iframe tag but still works when parsing it using Modest.

But for some reason, if you open it in Chrome (to render the javascript parts) and dump HTML, you get this:

<noscript>
<iframe 
      height="0" width="0" data-src="https://www.googletagmanager.com/ns.html?id=" class="lazyload"
       src="data:image/gif;base64,R0lGODlhAQ">
       <noscript>
        <iframe src="https://www.googletagmanager.com/ns.html?id=" height="0" width="0">
</noscript>

Now there are no closing tags for both iframes.

The problem with this is that Modest will ignore everything after such a tag:

<noscript>
<iframe data-src="https://www.googletagmanager.com/ns.html?id=">
</noscript>


<script></script>
<script></script>
<script></script>

Seaching for script nodes using myhtml_get_nodes_by_name or using CSS selectors returns no results.

@lexborisov Are there any ways to improve this? Other parsers can still handle this.

any test-browser

Hi!

first: Thx for your time and contributions!

finally i compiled the project ( on my win10 )
and can run the test exe files,

but I would like to test, the rendering engine, like a mini browser
it's possible

Regards
Frigyes

clang11 report error: cast to smaller integer type

modest/source/mycss/selectors/function_parser.c:469:57: error: cast to smaller integer type 'mycss_selectors_function_drop_type_t' (aka 'enum mycss_selectors_function_drop_type') from 'void *' [-Werror,-Wvoid-pointer-to-enum-cast]
        mycss_selectors_function_drop_type_t drop_val = mycss_selector_value_drop(selector->value);
                                                        ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
modest/include/mycss/selectors/value.h:28:41: note: expanded from macro 'mycss_selector_value_drop'
#define mycss_selector_value_drop(obj) ((mycss_selectors_function_drop_type_t)(obj))
                                        ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1 error generated.
[70/138] Building C object CMakeFiles/modest_static.dir/source/mycss/selectors/serialization.c.o
FAILED: CMakeFiles/modest_static.dir/source/mycss/selectors/serialization.c.o 
clang -DMyCORE_BUILD_WITHOUT_THREADS -Imodest/include -O3 -DNDEBUG   -Wall -Werror -pipe -pedantic -Wno-unused-variable -Wno-unused-function -std=gnu99 -MD -MT CMakeFiles/modest_static.dir/source/mycss/selectors/serialization.c.o -MF CMakeFiles/modest_static.dir/source/mycss/selectors/serialization.c.o.d -o CMakeFiles/modest_static.dir/source/mycss/selectors/serialization.c.o   -c modest/source/mycss/selectors/serialization.c
modest/source/mycss/selectors/serialization.c:183:69: error: cast to smaller integer type 'mycss_selectors_function_drop_type_t' (aka 'enum mycss_selectors_function_drop_type') from 'void *' [-Werror,-Wvoid-pointer-to-enum-cast]
                    mycss_selectors_function_drop_type_t drop_val = mycss_selector_value_drop(selector->value);
                                                                    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
modest/include/mycss/selectors/value.h:28:41: note: expanded from macro 'mycss_selector_value_drop'
#define mycss_selector_value_drop(obj) ((mycss_selectors_function_drop_type_t)(obj))
                                        ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1 error generated.
[96/138] Building C object CMakeFiles/modest_static.dir/source/myencoding/encoding.c.o

При сериализации border/padding не пишет значение

    Node: <img style="border:0px" src="ololo" alt>
    Declaration: border: ;

Здесь вообще пропустило border почему-то:

    Node: <img style="border: 1px; padding: 1px; padding: 1%;" src="ololo" alt>
    Declaration: padding: 1px; padding: ;

Я для тестов использовал код из examples/declarations/style_attr_low_level.c

Или это ещё не доработано и нормально?

build error

Hi. Trying to build modest's git version right now leads to the following error:

modest/css_property_to_node.c: In function ‘main’:
modest/css_property_to_node.c:114:21: error: variable ‘status’ set but not used [-Werror=unused-but-set-variable] modest_status_t status = modest_init(modest);

The error came from the build process for the example folder. I know that modest sets -Wno-unused-variable so I got confused there.

split source/myport/posix/mycore/perf.c

To avoid spaghetti code with plenty of #ifdef maybe splitting perf.c into several files specific to the OS would be better for maintainance

perf_common.c (code common to all the OS)
perf_linux.c
perf_darwin.c
perf_freebsd.c

also i could see that myport/posix and myport/windows_nt there are common code like memory.c

this maybe could be done by simplifying the tree:

source/myport/memory.c
source/myport/perf_common.c (perf code common to all the OS, including windows)
source/myport/perf_linux.c
source/myport/perf_darwin.c
source/myport/perf_freebsd.c
source/myport/perf_windows.c
etc...

name of shared libraries

if version is VMAJ.VMIN.VMIC :
on Windows: libmodest-$(VMAJ).dll and libmodest.dll.a for import library
on UNIX: libmodest-$(VMAJ).$(VMIN).$(VMIC).so and libmodest.a for static one

on Unix, there are also symbolic links to libmodest-$(VMAJ).$(VMIN).so and libmodest-$(VMAJ).so

Python bindings clarification

Hi !

I've just seen the link to the Python binding you added ! (Kudos to @rushter by the way)
Would you mind to specify that this is a Cython binding, and not exactly python ?

have a debug and release mode

in debug mode, these flags should be passed to the compiler : -g3 -ggdb3 -O0 -fno-omit-frame-pointer (to allow good stack unwiding for gdb or other debugguers)

in release mode, at least -fomit-frame-pointer

What doesn't it do?

Fantastic repo, thanks for sharing, reading the homepage it has multi-threading basic HTML and CSS support, so what doesn't it do? I'd suggest a tick-box list would be really awesome way to communicate

HTML 3 spec

  • TABLE
  • TR
  • TD
    ...

HTML5 Spec

  • video
  • sourceset
    ...

You can even strikethrough things that are not on the roadmap to communicate "Hey I don't intend on working on this". Could also help to attract the investment you mentioned as it's my understanding tech investors like a "big-picture" overview of what it does, where it's going, it's backstory.

version should not be 1.0.0

version 1.0.0 means that the project is released and that API should not be changed

previous version was 0.0.5, which is nice. Update version in Makefile to 0.0.6 and update README accordingly.

Update it to 0.1.0 once the rendering is done

Cmake support (patch to enable it)

Hi Lex,

I have attached a patch which enables CMake support. I noticed that you started another project, but it might be useful to still support this one, at least until the other project reaches feature parity.

Maybe it could also prove useful as inspiration for that other project? :)

The patch adds modern CMake support, and is validated on both Linux and Windows. On Windows, it works with both mingw (gcc) and the Visual Studio compilers.

Features:

  • ability to select whether to build the shared library and the static library (-DMODEST_BUILD_SHARED, and -DMODEST_BUILD_STATIC)
  • ability to select the C standard used by the compiler, and to reject using a lower standard version if the requested one is not supported (default is C99)
  • ability to add a suffix to debug binaries, by default "d";
  • ability to select whether to install the header files or not
  • ability to select whether to use the pthreads library or not; pthreads is linked only internally and not exposed to users of the modest library
  • automatically retrieve library version from header files
  • modern way of specifying target compiler definitions, options, include directories, with different settings for building the libraries and for installing the libraries
  • define the targets in a namespace (modest::modest_shared, and modest::modest_static)
  • install targets with proper creation of cmake config files at install location that allow using the libraries by simply specifying "find_package(modest)"
  • ability to pack the libraries for distribution
  • for the visual studio compiler, creation of exports for the shared library
  • for the binaries built with the Visual Studio compiler, install also the pdb file (with the debug symbols)
  • out-of-source builds

Notes:

  • the installation will install a cmake folder inside the lib folder, with cmake configuration files: a modestConfig, a modestConfigVersion, as well as modest.cmake that defines the import targets needed by users of the modest libraries, and a modest-xxxx.cmake file for each build configuration (Debug, Release, etc) that lets a project that uses the modest libraries link to the debug binaries when it's built in debug mode, or to the release binaries when its built in release mode
  • the debug suffix is useful for installing both the release and debug binaries at the same location; this allows a project that links the modest libraries to use either the debug or release binaries automatically, with no additional configuration; which are the binaries for debug or for release is specified in the installed cmake configuration files

A project that wants to use the installed libraries only has to add the modest libraries installation folder to its CMAKE_MODULE_PATH, and specify this in its CMakeLists.txt:

find_package(modest CONFIG REQUIRED)
...................................................................................
target_link_libraries(my_app
$<IF:$BOOL:${BUILD_SHARED_LIBS},modest::modest_shared,modest::modest_static>
)

The above will let the linking to the dynamic or the static library be controlled by the CMake global flag BUILD_SHARED_LIBS.

When "my_app" is built in Debug configuration, the modest Debug binaries will be linked in automatically. When "my_app" is built in Release configuration, the modest Release binaries will be linked in instead, etc.

As you can see, this is the modern Cmake way, with no need for include_directories and such.

To manually enforce the linking to the shared library, one would replace the above with:
target_link_libraries(my_app
modest::modest_shared
)

Or for enforcing the linking with the static library:
target_link_libraries(my_app
modest::modest_static
)

What this Cmake support patch does not do, is build the examples and the tests though.
cmake_support.zip

collection bug

    str = "<html>" + "<div class=A>ooo</div>" * 20000 + "</html>"
    parser = Myhtml::Parser.new(str)

    c = 0
    parser.css("div").each do |node|
      c += 1 if node.attribute_by("class") == "A"
    end
    p c

output 1566
expected 20000

is this the same bug lexborisov/myhtml#84?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.