Wednesday, July 5, 2023

SXA Search Endpoint Validation

In this article we learn about some opportunities to shield the SXA search endpoints from traffic that can produce noisy log messages.

Background

Our team frequently reviews the reports generated by a web application scanning software, an activity I highly recommend you incorporate into your process. In addition, we leverage aggregated log messages in another system (i.e. Elastic, Splunk) to better reveal what IIS and Sitecore experiences during the scans.

Research

One issue that seemed particularly interesting is related to the SXA search requests at the relative path //sxa/search/results and //sxa/search/facets. When the query string is invalid such as ?abc123 and ?l=abc123 the Sitecore logs will reveal exception messages due to the fact the query strings can't be properly bound or converted. It's not particular interesting until you see hundreds or thousands of the messages in a short window. I opened a support ticket with Sitecore and they acknowledged this as a bug (at least an issue up to SXA 10.3). Use reference number 589620 to track the progress of this issue.

Resolution

One solution provided by Sitecore support (during the interim period) is to apply a logging filter to exclude the messages.

An alternative solution is to globally prevent all of the abusive traffic from reaching your servers. If you are leveraging a CDN/Proxy product such as Cloudflare/Akamai you can deploy web application firewall rules to block the undesired traffic.

The following screenshot provides an example of rules you could put in place to reduce the noise. 


The rules perform the following:

  • For each check we ensure that only the /sxa/search/ endpoints are evaluated.
  • Reject query string parameters that contain an "L" parameter with anything other than a possible language (e.g. en-US).
  • Reject query string parameters that do not reasonably match the values SXA is expecting.
  • Reject POST method.
  • Reject query string parameters with commands like "ping".
After making these changes we were able to reduce unnecessary traffic.

I hope you found this useful. One day you too might be investigating such a rare and peculiar issue. Good luck!

Tuesday, January 31, 2023

Troubleshoot Certificate Revocation Lookups

In this article we investigate an issue related to LetsEncrypt certificates configured for your web applications and services.

Letsencrypt Logo

Background

Every so often we would notice in our non-production environment the custom contact forms would fail after the user submitted. These forms are built on top of Sitecore MVC and seemed to fail when the backend code attempted to POST to a service hosted by another team/party.

The error message we would see logged in the Sitecore log files looked like the following:

There are a few resources available online that describe possible causes for the error message and ultimately provide some kind of workaround to the problem.

The remote certificate is invalid according to the validation procedure.

Research

When reviewing our network we could not find anything that immediately stood out as the cause of the issue. If you make the assumption that calls to the web service are not yet happening, then the next logical thing to investigate are the steps .Net performs prior to making the outbound requests. 

I first confirmed that the web service URL can be reached. In my case the service has a URL to the Swagger UI available and that worked as expected. Second I checked that the LetsEncrypt root certificate was installed in the Trusted Root of the server; which it was. Finally, I used the certutil to verify the certificate which revealed some interesting results.

Running the utility output some key information we needed to determine next steps. As you can see in each of the following screenshots, the lencr.org domain is being accessed. Turns out that LetsEncrypt has a series of domains used to verify if a certificate is revoked. Read here for more details.

Error connecting message


Error connecting message

Final error message

Resolution

Ultimately we had the Network/Security Teams put in place firewall rules to allow traffic to the various domains outlined here.

  • *.o.lencr.org
  • *.i.lencr.org
  • *.c.lencr.org
I hope you found this useful. One day you too might be investigating such a rare and peculiar issue. Good luck!

Monday, January 16, 2023

Working with Unicorn and Sitecore CLI

 In this article we learn about some challenges I faced while working with both Unicorn and Sitecore CLI.

On my project we are at a stage in which a wholesale change from Unicorn to Sitecore CLI is not an option. There is however an advantage of updating the deployment pipeline to leverage the IAR plugin for the Sitecore CLI. Our current setup takes upwards of 30 minutes to complete the deployment. Imagine a folder of 40k Unicorn items and trying to zip that up, cleaning the old ones, extracting on the destination server, and then syncing items after every deployment. Deployment nights are exhausting.

Here is a brief list of issues I uncovered along the way.

  • Minor issues in YAML compatibility between Unicorn/Rainbow and Sitecore Rainbow.
  • Content of YAML may cause the creation of Items in Sitecore without a language version.

YAML Compatibility

The majority of the team is still relying on Unicorn to serialize content to disk. If you use Sitecore CLI to also serialize items, including overwriting files previously serialized by Unicorn, you'll notice some fields missing.

In the following partial example, if this was generated by the Sitecore CLI the database name would be missing which could cause an issue with Unicorn.

In the following partial example, if this was generated by the Sitecore CLI the Multilist field type would be missing which causes Unicorn to sync the fields without the pipe delimiter.

Recommendation: If you are going to also use the Sitecore CLI to serialize items, take note of what fields go missing and make sure they are added back.

The command used with the Sitecore CLI to generate IAR items would be one of the following:

  • Sitecore CLI Module -> dotnet sitecore itemres create -o _out/scms -i Scms.* --overwrite
  • Unicorn Folder -> dotnet sitecore itemres unicorn -o _out/scms -p "../Serialization" --overwrite

Language Versions

Most installations of Unicorn will exclude several fields such as __Revision. This missing from the serialized items (both in IAR form and YAML) may cause the Sitecore Publishing Service to improperly delete items during publishing. Due to this issue both SXA and SPE now include the revisions in the IAR files.

Another interesting discovery is how few fields you may see listed under each language version. In some files generated by Unicorn I noticed no fields listed under the version. Not sure if this was a mistake during developers merging or something else.

In other cases I found by excluding fields in the sitecore.json as seen in the official docs the YAML serialization and IAR generation will result in fields missing as well. The one time I actually read the manual I end up introducing an issue. (sitecore.json fields removed for brevity)

With the above configuration all the fields serialized are excluded when interacting with the Sitecore CLI. This certainly becomes an issue for items without other fields (like folders).

No fields equals no language version. No Language version equals a broken SPS. Kittens will cry. Ice cream will melt.

Recommendation: Update Unicorn.config to serialize with the __Revision field and then reserialize everything. An issue was opened with Unicorn to alter the patching behavior but for now you may wish to make the change directly for now by manually commenting out the line in the file here.

Run the following SPE report for finding problematic items. Uncomment the section related to language versions missing.

Tuesday, November 29, 2022

Hotfix Wiped Out My Roles

In this article we will see how the Default Identity Provider used with Identity Server learned a new behavior with role management when resolving users on sign in. This impacts Sitecore 10.2 when the cumulative hotfix 10.2.1 is installed.

Back Story

Here is how things played out for us. After we upgraded to Sitecore XM 10.2 we started to experience an issue on startup (typically after the application pool recycled) related to a concurrency issue when loading the IAR files. Specifically an ArgumentNullException is bubbled up to the Sitecore.Data.DataProviders.CompositeDataProvider which causes Sitecore to really struggle. Recycling the application pool once more generally resolves the issue. After opening a support ticket we learned the issue was recently resolved by a cumulative hotfix outlined here. Problem solved!

Unfortunately the hotfix revealed another issue which relates to how user membership is managed during the signin process. We found this during a deployment and users started to complain about not being able to do anything more than login. 

There exists a SignInProcessor which resolves the user, either by accessing the existing user or creating a new one. The hotfix includes a change to the internal code for this processor. Sitecore Support provided some additional details as to why things changed. I'll put into my own words the message they conveyed in the ticket:

The correct behavior for Federated Authentication is to allow the Identity Provider to control the user roles. The original implementation did not adhere to this and as such you could override the roles assigned to users from within Sitecore.

We are using ADFS and have things configured to require users to be in the Active Directory role "Sitecore-Users". We implemented a Sitecore.Owin.Authentication.Services.Transformation to override the claim roles and specifically only add Sitecore\Sitecore Client Users. You can read more about my implementation here. After the user performs an initial login, an Administrator can then assign roles from within Sitecore. This is super helpful as our corporate process for managing access is more tedious/slower than doing so in the Sitecore User/Role Manager.

The Fix

So imagine you are deploying a hotfix late at night and discover this issue. Opening a support ticket is great but obviously won't pull you out of this hole you just dug for you and the team. After a few minutes of poking around I narrowed the issue down to Sitecore.Owin.Authentication.Pipelines.CookieAuthentication.SignIn.ResolveUser where the new implementation wrecks our current process. Below is the implementation I extracted and used from the original 10.2 version.


Here's the code:

If you do come across this issue you may be on Sitecore XM 10.3+ and find that a new setting is available to restore the original functionality.

<clearroleswhensignin>false</clearroleswhensignin>

Let me know if this helps you out. Thanks for reading!

References

  • https://sitecore.stackexchange.com/questions/31725/how-to-configure-default-roles-when-using-identity-server-integrated-with-adfs-o
  • https://support.sitecore.com/kb?id=kb_article_view&sysparm_article=KB1001823

Saturday, August 27, 2022

Replacement Task Scheduler for Sitecore

In this article we discuss some of the challenges with the out-of-the-box Task Scheduler included with Sitecore and see how you can replace it with Hangfire, a product to perform background processing for .Net applications.

Update: A NuGet package is available for download here.


There are ton of articles describing the Task Scheduler and oftentimes cover the same information. Below are few to get you started:

Oddly the only things I can find on the Sitecore docs site is from the old SDN. I'll skip linking that here because it is likely to break.

Some of the issues you'll find with the Task Scheduler is the inability to run at a specific time and if Sitecore shuts down the missed tasks are likely to run immediately following startup. With the use of Hangfire we'll address both issues. The format of the schedule field is also a bit crazy and so we'll add to the complexity by including support for the cron format.

So why not SiteCron? You should use it. If however you can't use it, don't want to use it, or simply can't make up your mind then feel free to give this a try.

Here is a quick breakdown of what we'll build:

  1. Pipeline processor inheriting from Sitecore.Owin.Pipelines.Initialize.InitializeProcessor which should give us access to IAppBuilder. Here we'll register Hangfire and handle scheduling of jobs.
  2. Configuration patch to disable the agent used for scheduled tasks and to register our new processor. There are multiple agents used for scheduled tasks, so at the moment we'll only focus on the one that runs for master in the Standalone and ContentManagement roles.

The important part


Sitecore configuration patch to register the code:

Explanation:
  • Register processor in the own.initialize pipeline. The first option allows you to provide the amount of time to delay running scheduled tasks after Sitecore starts up; this is extremely helpful if you want to disable jobs and need some extra time or the jobs are process-intensive and you want to give Sitecore time to warm up. The second option allows you to adjust the frequency in which schedules configured in Sitecore (from the Content Editor) are updated in Hangfire; this is important for cases where Admins create/update/remove scheduled tasks.
  • Disable the the Master_Database_Agent so the old scheduler doesn't run.
Sitecore processor:

Explanation:
  • You'll need references to Hangfire.AspNetHangfire.Core, and Hangfire.MemoryStorage found on Nuget. I went with in-memory storage because I like things simple and having to setup connections to a database and custom tables sounds like a pain. Also, I feel like looking at the LastRun field was enough to satisfy what I needed.
  • In the Process method we do the basic configuration for Hangfire. I don't care about the dashboard like described here and here. Once again, I like things simple and adding another thing to maintain is just not for me. If you want to then go knock yourself out. We also schedule a fire-and-forget background job to initialize the scheduler which includes registering Scheduled Tasks defined in Sitecore.
  • Once the first background job runs, which may execute tasks that missed their scheduled run time, the recurring jobs are configured. Any scheduled task with an empty/invalid schedule is ignored. The creation of the recurring jobs are chained to the Initialize method so we can ensure that anything missed runs before we schedule others. This may be a problem if you have really long running jobs.
  • The default schedule format {start timestamp}|{end timestamp}|{days to run bit pattern}|{interval} is compatible but converted to a cron schedule format when registered (see GetSchedule method). The interval may be converted to a precise time when using the daily TimeSpan format. For example, the value 1.04:30:00 will run daily at 4:30 AM. The conversion code produces a cron format 30 4 * * * which is what ultimately gets provided to Hangfire. If you want to replace the default schedule format with a cron schedule format go right ahead as both formats are supported. The only problem I saw with this is when using the SPE Task Manager because the tool doesn't know anything about cron.
A simple report can be written with Sitecore PowerShell Extensions to display the current configured recurring jobs.


We've had it running for quite some time now and it has been a game changer for us. Think of all the scheduled tasks that seem to randomly run during deployments or competing with processing power needed by Content Editors.

Give it a try and let me know how it works out for you on your projects. Feedback welcome.

More Hangfire:

Disclaimer: After sharing this post I was reminded of an important distinction one should make with bolting on more features to the Sitecore platform. I found Hangfire helped solve an issue we were having with our scheduled tasks. Every day we run tasks that execute SPE scripts and these can be quite CPU intensive. The data belongs in Sitecore and the tasks need to be run outside of business hours (late night or early morning). As a separate application we built a sync with Quartznet using dotnet 6 (latest LTS version at the time) which dramatically improved the developer experience, performance, and maintainability.

Wednesday, May 4, 2022

Windows Hosts Writer 2.0

 Today I'm pleased to announce the release of Windows Hosts Writer 2.0! In this article we'll learn about what is new and how to get started. Also, a reminder about the pain of managing the hosts file.



There are many steps needed to get a website running locally. For traditional sites hosted with IIS you have to add application pools, websites, perhaps configure services accounts...the list goes on. One of the tasks that feel the most tedious is adding entries to the hosts file. If you are not familiar, this is an extension-less file found at C:\Windows\System32\drivers\etc\hosts and contains IP-to-Hostname mappings. Each row will have an IP address such as 127.0.0.1 followed by the hostname such as scms.dev.local

If you open the hosts file you may already see something like the following:


As you can see from the above image, there is not much going on. This file comes in handy when you want a fancy url to loop back to the local machine. So now you must be wondering how does Windows Hosts Writer (WHW) relate to this? When paired with Docker it can save a tremendous amount of time in managing the everchanging IP addresses.

A while back Rob introduced the Sitecore community to a neat tool distributed through its own Docker container. Check out his series of articles detailing the improvements made over time.

Finally, what is so important about this new 2.0 version of WHW? I'm glad you asked before bailing out on this lame blog post. Below is a breakdown of all the goodies.

  • Support for .net 3.1 ends December 3, 2022 while 6.0 is the latest version released with LTS. We went ahead and upgraded before we forget.
  • Fixed an issue with the TERMINATION_MAP feature. 🥇
  • Aliases that are space-delimited are treated like all the other host entries.
  • Consolidated the Dockerfile which enables contributors to debug locally with Visual Studio, build from docker-compose.yml, and ensure @RAhnemann can still do releases. 👍🏼
  • Encriched the readme with helpful details on getting started. Run docker compose up -d from the root directory to try it out.
  • Updated the referenced Docker.Dotnet assembly to address the dreaded exception Docker.DotNet.DockerApiException: Docker API responded with status code=BadRequest, response=400 Bad Request. This might have been revealed after upgrading Docker to 4.7.1.
If you want to get started, check out the readme on GitHub. The Docker images have been pushed to DockerHub here.

Friday, August 14, 2020

Unicorn Serialization for SXA Projects Part 2

 In a previous article I shared how I configure Sitecore projects with Unicorn when developing sites for SXA. In this article I share some improvements that can be made to simplify the number of configuration files you have to manage.

At this point you may have heard that Sitecore 9+ includes a capability for altering configuration files using custom defined application settings. Kamruz Jaman has written a detailed article covering many aspects of the features.

Let's have a look at what can be added to your configurations to control them per environment.

Begin by starting with a structure similar to this:



The part that is new and most interesting is the namespace added called environment. For this to function we need the "Web.config" to define it.


This is going to tell Sitecore that we have a namespace we intend to use in our configuration patches and for this transformed Web.config the value is "Dev". I like to use "Dev", "Int", "Tst", "Prd" for the environment names.


Let's take a look at a more complete configuration for use with Unicorn.


You'll notice that the include statements have duplicates such as "Forms" and "Content". This is possible because the environment values will limit only one to appear at any given time.


I certainly hope you found this to be helpful. Happy 2020!