Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slower than std? #24

Open
jonashaag opened this issue Oct 10, 2023 · 4 comments
Open

Slower than std? #24

jonashaag opened this issue Oct 10, 2023 · 4 comments

Comments

@jonashaag
Copy link

I was debugging this and wanted to write a reproducer. For some reason in my tests, std is always faster at parsing than this library. Anything wrong with my benchmark?

>>> import random
>>> open("/tmp/numbers","w").write("\n".join(str(random.randint(0, 1000000)) for _ in range(10000000)))
cargo run --example bench -r < /tmp/numbers
use std::io;
use std::str;
use std::str::FromStr;
use std::time::Instant;

use atoi::FromRadix10Signed;

fn main() {
    let mut buf_digits: Vec<u8> = Vec::new();
    let mut sum = 0;
    let texts: Vec<Vec<u8>> = io::stdin()
        .lines()
        .map(|l| l.unwrap().as_bytes().into())
        .collect();
    let now = Instant::now();
    for text in texts {
        let num = if true {
            // from_str
            let utf8 = str::from_utf8(&text).unwrap();
            i128::from_str(utf8).unwrap()
        } else if false {
            // from_str with . filter
            let utf8 = str::from_utf8(&text).unwrap();
            buf_digits.clear();
            buf_digits.extend(utf8.as_bytes().into_iter().filter(|&&c| c != b'.'));
            i128::from_str(str::from_utf8(&buf_digits).unwrap()).unwrap()
        } else if false {
            // from_radix
            let (num, _consumed) = i128::from_radix_10_signed(&text);
            num
        } else if false {
            // from_radix with . filter
            buf_digits.clear();
            buf_digits.extend(text.into_iter().filter(|&c| c != b'.'));
            let (num, _consumed) = i128::from_radix_10_signed(&buf_digits);
            num
        // } else if true {
        //     // atoi_radix10
        //     let utf8 = str::from_utf8(&text).unwrap();
        //     atoi_radix10::parse_from_str(utf8).unwrap()
        } else {
            panic!("select a benchmark");
        };
        if num < 0 {
            dbg!(num);
        }
        sum += num;
    }
    let elapsed_time = now.elapsed();
    println!("Took {} ms.", elapsed_time.as_millis());
    dbg!(sum);
}
@pacman82
Copy link
Owner

This crate comes with a criterion benchmark suite. I would suggest to check out the repostiory and run it using cargo bench. You even get nice plots. I added a commit with some benchmarks i128. This crate is less about crazy optimizations, and more about avoiding the detour over utf8. On my system this seems to pay off. Your milage may vary.

@jonashaag
Copy link
Author

Here's a benchmark suite that demonstrates the difference

use atoi::{FromRadix10, FromRadix10Checked, FromRadix10Signed, FromRadix16, FromRadix16Checked};
use criterion::{black_box, criterion_group, criterion_main, Criterion};
use std::str;
use std::str::FromStr;

use std::fs::read_to_string;

pub fn i128_signed_four_digit_number(c: &mut Criterion) {
    c.bench_function("signed i128 four digit number", |b| {
        let lines: Vec<Vec<u8>> = read_to_string("/tmp/numbers")
            .unwrap()
            .lines()
            .map(|l| l.as_bytes().into())
            .collect();
        b.iter(|| {
            black_box(&lines)
                .iter()
                .map(|l| i128::from_radix_10_signed(l).0)
                .collect::<Vec<_>>()
        })
    });
}

pub fn i128_through_utf8(c: &mut Criterion) {
    c.bench_function("i128 via UTF-8", |b| {
        let lines: Vec<Vec<u8>> = read_to_string("/tmp/numbers")
            .unwrap()
            .lines()
            .map(|l| l.as_bytes().into())
            .collect();
        b.iter(|| {
            black_box(&lines)
                .iter()
                .map(|l| {
                    let s = str::from_utf8(l).unwrap();
                    s.parse::<i128>().unwrap();
                    //i128::from_str(s).unwrap();
                    //atoi_radix10::parse_from_str(s).unwrap::<i128>();
                })
                .collect::<Vec<_>>()
        })
    });
}

criterion_group!(benches, i128_signed_four_digit_number, i128_through_utf8,);
criterion_main!(benches);

std is 4x faster. Anything wrong about my benchmark?

@pacman82
Copy link
Owner

Not on the face of it. Wouldn't be able to reproduce without your numbers though.

@jonashaag
Copy link
Author

See first post, it's just a bunch of ints

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants