Overview
What is Databricks Data Intelligence Platform?
Databricks in San Francisco offers the Databricks Lakehouse Platform (formerly the Unified Analytics Platform), a data science platform and Apache Spark cluster manager. The Databricks Unified Data Service aims to provide a reliable and scalable platform for data pipelines, data…
Most collaborative Data Science & AI workspace !
Databricks Lakehouse Platform: A 2-year user review
Databricks Lakehouse Platform for all your analytics requirements
Best in the industry
The wonders of all your data analysis in one place
Positive review for Databricks Lakehouse Platform
My Lakehouse experiences
Databricks is Great Platform for Data Virtualization based on Delta Lake
Data for insights
Databricks Lakehouse is modern solutions for current big data problems
It is used as part of solving different data …
Databricks--a good all-rounder
Great for both ad-hoc analyzes and scheduled jobs
Databricks for modern day ETL
Once this raw data is on S3, we use Databricks to …
Databricks provides a cost effective end to end solution for Enterprise analytics
- Ingestion and cleansing of data
- Interactive Analysis of data
- Development of Analytic Services
- Production Environment …
Databricks Review
Awards
Products that are considered exceptional by their customers based on a variety of criteria win TrustRadius awards. Learn more about the types of TrustRadius awards to make the best purchase decision. More about TrustRadius Awards
Reviewer Pros & Cons
Pricing
Standard
$0.07
Premium
$0.10
Enterprise
$0.13
Entry-level set up fee?
- No setup fee
Offerings
- Free Trial
- Free/Freemium Version
- Premium Consulting/Integration Services
Product Details
- About
- Tech Details
- FAQs
What is Databricks Data Intelligence Platform?
Databricks Data Intelligence Platform Technical Details
Deployment Types | Software as a Service (SaaS), Cloud, or Web-Based |
---|---|
Operating Systems | Unspecified |
Mobile Application | No |
Frequently Asked Questions
Comparisons
Compare with
Reviews and Ratings
(75)Community Insights
- Business Problems Solved
- Pros
- Cons
- Recommendations
The Databricks Lakehouse Platform, also known as the Unified Analytics Platform, has been widely used by multiple departments to address a range of data engineering and analytics challenges. Users have leveraged the platform to initiate data warehousing, SQL analytics, real-time monitoring, and data governance. The versatility and openness of the platform have allowed users to save a significant amount of time and effectively manage cloud costs and human resources.
Customers have utilized the Databricks Lakehouse Platform for various use cases, including creating dashboards with tools like Tableau, Redash, and Qlik, as well as integrating with CRM systems like Salesforce and SAP. The platform has also been employed for developing chatbots in Knowledge Management and serving machine learning models behind API endpoints. Furthermore, it is extensively used for data science project development, facilitating tasks such as data analysis, wrangling, feature creation, training, model testing, validation, and deployment.
Databricks' integration capabilities, including Git integration and integration with Azure or AWS, enable users to leverage the power of integrated machine learning features. Additionally, the platform's reliability and excellent technical support make it a preferred choice for building data pipelines and solving big data engineering problems. It is widely used by engineering and IT teams to transform IoT data, build data models for business intelligence tools, and run daily/hourly jobs to create BI models.
Moreover, the Databricks Lakehouse Platform serves as an invaluable learning tool for individuals in the Computer Information System department. The community forum proves particularly helpful for self-learners with questions. Furthermore, the platform supports deep dive analysis on metrics by Data and Product teams, facilitates client reporting and analytics through data mining capabilities, replaces traditional RDBMS like Oracle for Big Batch ETL jobs on big data sets.
In summary, the Databricks Lakehouse Platform is employed across organizations to solve a variety of data engineering and analytics use cases. Its seamless integration with cloud platforms, support for different data formats, and scalability make it suitable for tasks such as data ingestion and cleansing, interactive analysis, and development of analytic services.
User-Friendly SQL: Users have found the SQL in Databricks to be user-friendly, allowing them to easily write and execute queries. Several reviewers have praised the intuitive nature of the SQL interface, making it accessible for users of different skill levels.
Enhanced Collaboration: The enhanced collaboration between data science and data engineering teams is seen as a positive feature by many users. They appreciate how Databricks facilitates seamless communication and knowledge sharing among team members, ultimately leading to improved productivity and efficiency.
Versatile Integration: The integration with multiple Git providers and the merge assistant is highly valued by users. This feature allows for smooth version control and simplifies the collaborative development process. With this capability, developers can easily manage their codebase, track changes, resolve conflicts, and ensure a streamlined workflow.
Confusing Workspace Navigation: Several users have found the navigation to create a workspace in the Databricks Lakehouse Platform confusing and time-consuming, hindering their productivity. They have expressed frustration over the complex steps involved, resulting in wasted time.
Difficulty Locating Tables: Many reviewers have expressed difficulty in locating tables after they were created, often leading to the need for deletion and recreation. This issue has caused frustration and wasted time for users who struggle to find their data within the platform.
Random Task Failures: Some users have experienced random task failures while using the platform, making it challenging for them to debug and profile code effectively. These unexpected failures undermine confidence in the system's stability and result in delays as users attempt to identify and fix these issues.
Users highly recommend the Lakehouse platform for various data-related tasks, such as building cloud-native lakehouse platforms, ingesting and transforming big data batches/streams, and implementing medallion lakehouse architectures. They find the platform simple to use and appreciate its hassle-free administration and maintenance.
The Lakehouse platform is also highly recommended for setting up Hadoop clusters and dealing with big data, analytics, and machine learning workflows. Users believe that it provides a comprehensive and open solution for these tasks.
Users suggest exploring the features of the Lakehouse platform, such as partner connect, advanced analytics/MLOPS/Data science Auto-ML capabilities. They find these features useful and believe that they enhance the platform's salient functionalities.
Overall, users highly recommend the Lakehouse platform for its ease of use, support for major cloud providers (AWS, AZURE, GCP), and useful features like data sharing (Delta Sharing). However, users also recommend considering the level of reliance on proprietary technology versus industry standards like Spark, SQL, and dbt. It is advised to read through the documentation and gather firsthand experiences from individuals who have used the Lakehouse platform.
Attribute Ratings
Reviews
(1-17 of 17)Most collaborative Data Science & AI workspace !
* Creating dashboards with Tableau, Redash, Qlik,
* Feed their CRM tool like Salesforce, SAP,
* developing chatbots for Knowledge Management
* Serve ML models behind API endpoints.
Databricks Lakehouse Platform is a versatile and open product that saves us a lot of time, help us control cloud cost and human resources energy !
- Enhanced Data Science & Data Engineering collaboration
- Complete Infrastructure-as-code Terraform provider
- Very easy streaming capabilities
- Multiple Git providers integration with merge assistant
- VsCode IDE support for local development
- Python SDK for Workflows
- Poetry support
It would be less appropriate for very small data projects as the entry cost may be high. Yet, if the data is meant to grow, Databricks will horizontally scale without requiring a re-write of your codebase
Databricks Lakehouse Platform: A 2-year user review
- MLFLOW Experiment
- MLFLOW Registry
- Databricks Lakehouse Platform Notebook
- Connect my local code in Visual code to my Databricks Lakehouse Platform cluster so I can run the code on the cluster. The old databricks-connect approach has many bugs and is hard to set up. The new Databricks Lakehouse Platform extension on Visual Code, doesn't allow the developers to debug their code line by line (only we can run the code).
- Maybe have a specific Databricks Lakehouse Platform IDE that can be used by Databricks Lakehouse Platform users to develop locally.
- Visualization in MLFLOW experiment can be enhanced
- Very well optimized Spark Jobs Execution Engine.
- Time travel in Databricks Lakehouse Platform allows you to version your datasets.
- Newly integrated Analytics feature allows you to build visualization dashboards.
- Native integration with managed MLflow service.
- Running MLflow jobs remotely is extremely cluttered and needs to be simplified.
- All the runnable code has to stay in Notebooks which are not very production-friendly.
- File management on DBFS can be improved.
Best in the industry
- Data Science code agnostic (SQL, R, Pyton, Pyspark, Scala)
- Customer Service with REAL support from data eng. and data scientist
- Integration with many technology : Tableau, Azure, AWS, Spark, etc.
- Visualization
- Collaboration
The wonders of all your data analysis in one place
- Cross company shared workspaces for unified comprehension of the data
- Combining different languages such as SQL and Python in one single space in order to make data analysis
- Quick execution of highly complex queries
- How graphs are created, it requires a certain level of expertise in the platform and it could be more intuitive and user friendly
- More guidance on the basics, since some of the new users come from different platforms expecting a similar UI
- An option where all the tables are shown with their respective fields, when a DB is selected for a query
it is less appropriate for users who don't have full knowledge of the tables they are going to query on and need more support on the data, since the platform doesn't give an option to see what are the fields in a table before even querying it
Positive review for Databricks Lakehouse Platform
- Scheduling jobs to automate queries
- User friendly - a new user can easily navigate through SQL/Python queries
- Options to code in multiple languages (SQL, Python, Scala, R) and easy to switch with the use of the % operator
- Errors can be difficult to understand at times
- Session resets automatically at times, which leads to the temporary tables being wiped out from memory
- Git connections are dicey
- Very inconsistent with job success/failure notification emails
The ability to store temporary/permanent tables on data lakes is a fabulous feature as well. PySpark is an excellent language to learn and it works really fast with large datasets.
My Lakehouse experiences
- Better performance through consolidating small files in delta tables
- ACID functionality on delta tables
- Live delta tables
- Make it easier to test features in public preview, like delta live tables.
It is kind of proposed to use across the whole organization and different BU's. Databricks will be our key main virtualized platform.
It addresses very fast data ingestion, reduces the overall ETL window. Integrated different datasource and also helps to perform Machine Learning jobs to run and scale. Idea is to reduce overall computation time to save cost on onprem.
- Data Virtualization
- Spark Real time and Batch streaming
- Notebook to run Jobs
- integrate Python and Apache Spark SQL
- SQL Analytics
- SQL Analytics Performance
- Help migration for RDBMS sources
- To make Transactional OLTP aspects faster
Data for insights
- SQL
- User friendly
- Great development environment
- Errors are not explained
- No data back up feature
- Interface can be more intuitive
It is used as part of solving different data engineering and data analytics use cases in different teams.
Databricks Lakehouse platform provides seamless integration with Azure cloud in Maersk. Databricks Lakehouse platform uses spark, mlops, delta for slovong the recent big data engineering problems.
- Seamless integration with Azure cloud platform services like Azure Data Lake Storage, Blobstorage , Azure Data Factory, Azure DevOps.
- Databricks lakehouse platform in backed uses Apache Spark for all the computation to be faster and distributed. It helps to complete data pipelines to process huge amounts [of] big data in lesser time with low cost.
- Databricks Lakehouse solves the problems data lake, by introducing Delta Lake concept. It provides support for updates, deletes, schema evaluation.
- Databricks Lakehouse platform can provide better platform for managing, and monitoring the cluster performance, utilization, optimization suggestions. It helps developers to leverage those insights for building better data pipelines.
- Databricks Lakehouse platform can provide GUI version to create spark jobs by click, drag and drop. That reduces the significant amount of time to develop code.
- Databricks Lakehouse platform can provide better insights and details regarding the jobs failures and resources consumption
1. Process different types of data sources like structured data, semi structured data and unstructured data.
2. Process data different data sources like RDBMS, REST APIs, File servers, IoT sensors.
3. Provide support for Updates, Deletes, schema evaluation
Databricks Lakehouse platform is not well suited for below usecases :
1. Less data volume and doesn't have analytics requirements
Databricks--a good all-rounder
- Complex transformations
- Supports major data sources
- Great performance
- User interface to connect data sources
- Pricing
- Community support
- Ready-2-use Spark environment with zero configuration required
- Interactive analysis with notebook-style coding
- Variety of language options (R, Scala, Python, SQL, Java)
- Scheduled jobs
- Random task failures
- Hard to debug code
- Hard to profile code
Databricks for modern day ETL
Once this raw data is on S3, we use Databricks to write Spark SQL queries and pySpark to process this data into relational tables and views.
Then those views are used by our data scientists and modelers to generate business value and use in lot of places like creating new models, creating new audit files, exports etc.
- Process raw data in One Lake (S3) env to relational tables and views
- Share notebooks with our business analysts so that they can use the queries and generate value out of the data
- Try out PySpark and Spark SQL queries on raw data before using them in our Spark jobs
- Modern day ETL operations made easy using Databricks. Provide access mechanism for different set of customers
- Databricks should come with a fine grained access control mechanism. If I have tables or views created then access mechanism should be able to restrict access to certain tables or columns based on the logged in user
- There should be improved graphing and dash boarding provided from within Databricks
- Better integration with AWS could help me code jobs in Databricks and run them in AWS EMR more easily using better devops pipelines
- Ingestion and cleansing of data
- Interactive Analysis of data
- Development of Analytic Services
- Production Environment Customer Facing Analytic Services
- Collaborative Development Environment using Notebooks.
- Stable and Secure Cloud Development Environment requiring minimum DevOPs support
- Fast with excellent scalability reduces time to market
- Open source library support
- Automation of Machine Learning Development
- Optimization of GPU usage
Databricks Review
- Extremely Flexible in Data Scenarios
- Fantastic Performance
- DB is always updating the system so we can have latest features.
- Better Localized Testing
- When they were primarily OSS Spark; it was easier to test/manage releases versus the newer DB Runtime. Wish there was more configuration in Runtime less pick a version.
- Graphing Support went non-existent; when it was one of their compelling general engine.
- DB generally fits 95% of what you need to do
- Primarily the ability to transform data and or do ad-hoc DS work
- There is databricks community, which is a free version. It is available for beginners to have an easy start with a big data platform. It does not have every feature of the full version but is still adequate for extremely new coders.
- There are many resourceful training elements that are available to developers, data scientists, data engineers and other IT professionals to learn Apache Spark.
- The navigation through which one would create a workspace is a bit confusing at first. It takes a couple minutes to figure out how to create a folder and upload files since it is not the same as traditional file systems such as box.com
- Also, when you create a table, if you forgot to copy the link where the table is stored, it is hard to relocate it. Most of the time I would have to delete the table and re-created.
Databricks Review
[It's] Used by self-service analysts to quickly do analysis
- Very simplified infrastructure initialization
- Seamless and automated optimization of job execution
- Simple tool to get used to
- Visualization - Great area of improvement
- Integration with Git
- COST