Coder Social home page Coder Social logo

Comments (7)

douglarek avatar douglarek commented on June 26, 2024 4

强转的一个问题是转换后的 byte slice cap 很大,这个是不好的,比如 https://play.golang.org/p/_tqfAgxlZAv ,所以简单粗暴的强转不可取,因为无法拿到 byte slice 的 cap,一个性能较好的实现是 fasthttp 的( https://github.com/valyala/fasthttp/blob/c48d3735fa9864a7c1724168812f3571c8313581/bytesconv.go#L387 )。

from go-questions.

changkun avatar changkun commented on June 26, 2024 2
  1. 是的,官方文档里面已经说明了这个问题:the Data field is not sufficient to guarantee the data it references will not be garbage collected, so programs must keep a separate, correctly typed pointer to the underlying data. -- https://golang.org/pkg/reflect/#SliceHeader
    原来的代码是错误的。

  2. 不用这么复杂,可以直接切为 unsafe 强制转换,,而且这种方式更加高效:

func string2bytes(s string) []byte {
	return *(*[]byte)(unsafe.Pointer(&s))
}

附:性能对比

// main.go
package main

import (
	"reflect"
	"unsafe"
)

func string2bytes1(s string) []byte {
	stringHeader := (*reflect.StringHeader)(unsafe.Pointer(&s))

	var b []byte
	pbytes := (*reflect.SliceHeader)(unsafe.Pointer(&b))
	pbytes.Data = stringHeader.Data
	pbytes.Len = stringHeader.Len
	pbytes.Cap = stringHeader.Len

	return b
}

func string2bytes2(s string) []byte {
	return *(*[]byte)(unsafe.Pointer(&s))
}
// main_test.go
package main

import (
	"fmt"
	"math/rand"
	"reflect"
	"testing"
	"time"
)

func TestString2Bytes(t *testing.T) {
	s := "qcrao/Go-Questions"
	if string(string2bytes2(s)) != s {
		t.Fatalf("string2bytes2 is not properly implemented")
	}
	if !reflect.DeepEqual(string2bytes1(s), string2bytes2(s)) {
		t.Fatalf("strings2bytes implementation does not match")
	}
}

func init() {
	rand.Seed(time.Now().UnixNano())
}

var letterRunes = []rune("abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ")

func genstring(n int) string {
	b := make([]rune, n)
	for i := range b {
		b[i] = letterRunes[rand.Intn(len(letterRunes))]
	}
	return string(b)
}

func BenchmarkString2Bytes(b *testing.B) {
	funcs := map[string]func(string) []byte{
		"string2bytes1": string2bytes1,
		"string2bytes2": string2bytes2,
	}

	for name, f := range funcs {
		for i := 1; i < 10000; i *= 10 {
			s := genstring(i)
			b.Run(fmt.Sprintf("%v-%v", name, i), func(b *testing.B) {
				for i := 0; i < b.N; i++ {
					f(s)
				}
			})
		}

	}
}
$ go test -v -run=none -bench=. -benchmem -count=10 . | tee bench.txt
$ benchstat bench.txt

name                                 time/op
String2Bytes/string2bytes1-1-12      3.07ns ± 1%
String2Bytes/string2bytes1-10-12     3.08ns ± 2%
String2Bytes/string2bytes1-100-12    3.08ns ± 1%
String2Bytes/string2bytes1-1000-12   3.08ns ± 0%
String2Bytes/string2bytes1-10000-12  3.07ns ± 1%
String2Bytes/string2bytes2-1-12      1.95ns ± 2%
String2Bytes/string2bytes2-10-12     1.95ns ± 2%
String2Bytes/string2bytes2-100-12    1.94ns ± 1%
String2Bytes/string2bytes2-1000-12   1.95ns ± 1%
String2Bytes/string2bytes2-10000-12  1.96ns ± 3%

name                                 alloc/op
String2Bytes/string2bytes1-1-12       0.00B     
String2Bytes/string2bytes1-10-12      0.00B     
String2Bytes/string2bytes1-100-12     0.00B     
String2Bytes/string2bytes1-1000-12    0.00B     
String2Bytes/string2bytes1-10000-12   0.00B     
String2Bytes/string2bytes2-1-12       0.00B     
String2Bytes/string2bytes2-10-12      0.00B     
String2Bytes/string2bytes2-100-12     0.00B     
String2Bytes/string2bytes2-1000-12    0.00B     
String2Bytes/string2bytes2-10000-12   0.00B

from go-questions.

luojiego avatar luojiego commented on June 26, 2024

@changkun string2bytes2 转换函数严格意义上来讲是错误的,因为转换的时候并未正常给 cap 赋值。

package main

import (
	"fmt"
	"reflect"
	"runtime"
	"unsafe"
)


func string2bytes1(s string) []byte {
	stringHeader := (*reflect.StringHeader)(unsafe.Pointer(&s))

	var b []byte
	pBytes := (*reflect.SliceHeader)(unsafe.Pointer(&b))
	pBytes.Data = stringHeader.Data
	pBytes.Len = stringHeader.Len
	pBytes.Cap = stringHeader.Len

	runtime.KeepAlive(s)
	return b
}

func string2bytes2(s string) []byte {
	return *(*[]byte)(unsafe.Pointer(&s))
}

func main() {
	s1 := string2bytes1("Roger")
	fmt.Println(s1)
	fmt.Println(len(s1))
	fmt.Println(cap(s1))
	s2 := string2bytes2("Roger")
	fmt.Println(s2)
	fmt.Println(len(s2))
	fmt.Println(cap(s2))
}

s2 的 cap 输出将会是一个随机值。

[82 111 103 101 114]
5
5
[82 111 103 101 114]
5
4840475

from go-questions.

changkun avatar changkun commented on June 26, 2024

@luojiego 不好意思,我认为这是实现者的决策,而不是正确与否的问题。如果我们要讨论「严格意义」上说,你不应该做这种实现,要么老老实实带拷贝的转换,要么用标准库 bytes.Buffer

另外,string2bytes1 中的 runtime.KeepAlive(s) 是不必要的。

from go-questions.

luojiego avatar luojiego commented on June 26, 2024

@luojiego 不好意思,我认为这是实现者的决策,而不是正确与否的问题。如果我们要讨论「严格意义」上说,你不应该做这种实现,要么老老实实带拷贝的转换,要么用标准库 bytes.Buffer

另外,string2bytes1 中的 runtime.KeepAlive(s) 是不必要的。

OK,非常感谢!

from go-questions.

techone577 avatar techone577 commented on June 26, 2024

为什么 cap 值会这么大?从汇编代码看貌似 cap 值为字符串的 Data 的地址值,但又不是稳定复现的

from go-questions.

luojiego avatar luojiego commented on June 26, 2024

为什么 cap 值会这么大?从汇编代码看貌似 cap 值为字符串的 Data 的地址值,但又不是稳定复现的

src/reflect/value.go 有关于 string 的 []byte 的底层结构体定义,因为 []byte 比 string 多了 Cap 字段,如果使用 unsafe 包直接将 string 转换成 slice,必然会导致 Cap 未正确赋值。

// StringHeader is the runtime representation of a string.
// It cannot be used safely or portably and its representation may
// change in a later release.
// Moreover, the Data field is not sufficient to guarantee the data
// it references will not be garbage collected, so programs must keep
// a separate, correctly typed pointer to the underlying data.
type StringHeader struct {
    Data uintptr
    Len  int
}

// SliceHeader is the runtime representation of a slice.
// It cannot be used safely or portably and its representation may
// change in a later release.
// Moreover, the Data field is not sufficient to guarantee the data
// it references will not be garbage collected, so programs must keep
// a separate, correctly typed pointer to the underlying data.
type SliceHeader struct {
    Data uintptr
    Len  int
    Cap  int
}

from go-questions.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.