š·š„ The Fight Against Labeled Dataset Dependencies
The Scope covers the most relevant ML papers, real-world ML use cases, cool tech releases, and $ in AI. Weekly
šĀ EditorialĀ
Supervised learning has dominated the world of machine learning (ML) for the last few decades.Ā The predominance of supervised models in mainstream ML applications seems logical considering that they are easier to model, interpret, and optimize than the non-supervised alternatives. However, supervised ML modelsĀ have the big limitation ofĀ theirĀ dependency on large, labeled datasets which are very expensive to build and maintain.Ā The dependencies onĀ labeled dataĀ are not only technological but also economical asĀ it has made ML research a privilege of large organizations with access to highly curated datasets.Ā To that,Ā we should add that supervised learning paradigms are not particularly good at generalizing across multiple tasks.Ā Steadily decreasing the level of supervision in ML models is one of theĀ paramount challenges for the next decade of ML. The ML industry recognizes that and makes massive inroads.Ā Ā Ā
TheĀ last few years have seen an explosion of research and implementation efforts toĀ reduce the dependencies on labeled datasets. From pretrained models to semi and self-supervised learning paradigms, we regularly seeĀ lightly supervised models match and outperform supervised alternatives across different domains such as computer vision, language, speech, and many others. Just this week, Facebook and SalesforceĀ unveiledĀ researchĀ effortsĀ that leverage softer forms of supervision for areas such as speech analysis and code generation respective.Ā In the next few years, we are likely to see theseĀ types of models transition from research efforts by big AI labs to mainstream ML applications.Ā Ā Ā Ā
šŗš»TheSequence Scope ā our Sunday edition with the industryās development overview ā is free. To receive high-quality content about the most relevant developments in the ML world every Tuesday and Thursday, pleaseĀ subscribeĀ toĀ TheSequence EdgeĀ šŗš»
šĀ Next week in TheSequence Edge:
Edge#123:Ā we start a new series aboutĀ self-supervised learning; discuss āSelf-Supervised Learning, the Dark Matter of Artificial Intelligenceā paper; explore VISSL, a framework for self-supervised learning in computer vision.
Edge#124:Ā we do a deep dive about Pachyderm platform updates.
Now, letās review the most important developments in the AI industry this week
š ML Research
Code IntelligenceĀ Ā
Salesforce Research published a paper detailing Code T5, a pretrained programming language model thatĀ achieves state-of-the-art performance in 15 code intelligence tasks ->read more on Salesforce Research blog
Textless NLPĀ
Facebook AI Research (FAIR) published a paperĀ introducing a generative model that canĀ master NLP tasks usingĀ raw audio files in almost any language ->read more on FAIR blog
Speech Recognition ModelsĀ for Speech ImpairmentĀ
Google Research released two papers and an open-source datasetĀ to foment the implementation of speech recognition models thatĀ can work for people suffering from speech impairment problems ->read more on Google Research blog
š Ā Real World ML
Scaling Hadoop YARN at LinkedInĀ
The LinkedIn engineering team published a blog post detailing theĀ architecture used to scaleĀ their Hadoop YARN infrastructure beyond 10.000 nodes ->read more on LinkedIn blog
UberĀ JellyfishĀ
Uber engineering published a blog post detailing the architecture behind itsĀ schemalessĀ data storage infrastructure called Jellyfish ->read more on Uber engineering blog
š¤ Cool AI Tech Releases
JetBrainsĀ DataSpellĀ
JetBrains announced the release ofĀ DataSpell, a new IDE optimized for data science programs ->read more on JetBrains blog
AWS S3 Plugin forĀ PyTorchĀ
AmazonĀ released an S3 plugin forĀ PyTorch,Ā whichĀ enables the usage of S3 data buckets inĀ PyTorchĀ datasets ->read more on AWS engineering blog
TensorFlow Lite andĀ XNNPACKĀ
TensorFlow unveiled an extended integration withĀ XNNPACKĀ for faster-quantized inference models ->read more on TensorFlow blog
šÆ Useful Tweet
šø Money in AI
ML&AI&Quantum
Database startupĀ SingleStoreĀ raised $80 million in a Series F funding led by Insight Partners. Hiring in the US/Portugal/Remote.
QuantumĀ control hardware and software platform Quantum Machines raised a $50 million Series B round led by Red Dot Capital Partners. Hiring mostly in Israel.
Conversational AI startupĀ PolyAI raised $14 million in a funding round led by Silicon Valleyās Khosla Ventures. Hiring in the US and UK.
Computer vision training platformĀ Mobius LabsĀ raised a ~$6.1 million funding round led by Ventech VC. Hiring in Berlin.
AI-powered:
Relationship intelligence platform Affinity raised an $80 million Series C funding round led by Menlo Ventures. Hiring in SF/Toronto/Remote.
Fertility-focused women health startup Flo raised a $50 million Series B round co-led byĀ VNV Global and Target Global. Hiring worldwide.
Work insights platformĀ FinĀ raised $20 million in Series A funding, led by Coatue. Hiring in the US.
Virtual meeting platformĀ Vowel raised $13.5 million in a Series A round led by Lobby Capital. Hiring remote.