This is an amazingly valuable product, I am very happy that you have decided to release this Open-Source, we will definitely be checking this out for our AI powered tool!
Report
Maker
@anthony_green2 thanks for the kind words! Open-source is the way! π
π Kudos to the Giskard team for the thoughtful and extensive development of Giskard 2.0, addressing the critical challenges in ML testing. The model-agnostic approach and compatibility with various Python ML ecosystems make it versatile.
As a suggestion, have you considered implementing a feature that allows users to share best practices or case studies within the Giskard community, fostering collective learning among AI practitioners?
Report
Maker
@malkielfalcone thanks for the kind words! π That's a great idea. This is actually in the works. There are 3 avenues to implement this: (1) we have a Discord community where we will be able to post some of these case studies and recommendations, (2) the tutorials section of our documentation tries to capture this, and (3) we're also working on hosting them on our main website. Glad to hear it's something that would be of interest! π
Links to our docs: https://docs.giskard.ai/
Hey Giskard team, big congrats on the launch! Giskard 2.0 sounds like a real game-changer.
Just one question: How does Giskard handle the detection of ethical biases in ML models?
And a suggestion - how about a feature simulating the impacts of detected vulnerabilities? It could help teams make more informed decisions. Keep up the great work!
Hi @ke_ouyang, for traditional NLP models we mostly rely on metamorphic tests to detect ethical biases. For example changing the input a little by switching pronouns ("he" into "she" or viceversa), names, countries, religion terms, etc. and measuring the effect that this has on the model output. (See for example https://docs.giskard.ai/en/lates...)
We also do various checks on subpopulations (data slices), for example checking that accuracy/precision/etc. of the model is not significantly different for certain groups (e.g. `gender = x` or other features in the data).
For LLMs instead we try to elicit inappropriate behavior by crafting adversarial inputs and evaluating the model with an LLM-as-a-judge approach, you can find more details here: https://docs.giskard.ai/en/lates...
Report
Finally, a solution for the ML testing bottleneck!
I'm curious to know, can Giskard handle testing for models trained on non-English languages effectively?
Report
Maker
@aliceetayloor thanks for the comment, yes it handles non-English languages effectively, especially for LLM use-cases!
Report
Hey Team Giskard!
Really pumped about Giskard 2.0 hitting the market! The automation of tests, compatibility with a variety of ML tools, and a strong commitment to standards β that sounds like the future of MLOps.
One thing to watch out for is ensuring that expanding to accommodate growing data and models doesn't compromise performance or ease of use. The potential downside I see is the challenge of integrating with clients' established systems. Convincing teams to switch to new tech can be tough, even when it's better.
How do you plan to tackle this adoption hurdle? And what cool features are you lining up to stay ahead of the curve?
Report
Maker
@drsudba thanks for your feedback! We tackle this in 2 ways: (1) making the product open-source to lower the barrier of entry and (2) integrating with other popular tools in the ML world (MLflow, W&B, GitHub). Here's a link to our integrations page: https://docs.giskard.ai/en/lates...
2 years to develop an entire product. You guys are very diligent and thatβs commendable. This deserves respect, so I will definitely support your product. Congratulations on the launch from the Giskard team
@gipetto Thank you! Indeed, this has been a 2-year R&D effort, and we've built it all in the open, for the community! Hope it's valuable for your ML engineering projects.
OpenRep
Skinive AI: Skin Scanner, health checkup
Codebay
Giskard
Toofan
Giskard
Bunkr
Giskard